Skip to main content
Ctrl+K
SGLang - Home SGLang - Home

Get Started

  • Install SGLang

Basic Usage

  • Sending Requests
  • OpenAI-Compatible APIs
  • Ollama-Compatible API
  • Offline Engine API
  • SGLang Native APIs
  • Sampling Parameters
  • Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)

Advanced Features

  • Server Arguments
  • Hyperparameter Tuning
  • Attention Backend
  • Speculative Decoding
  • Structured Outputs
  • Structured Outputs For Reasoning Models
  • Tool Parser
  • Reasoning Parser
  • Quantization
  • Quantized KV Cache
  • Expert Parallelism
  • DP, DPA and SGLang DP Router
  • LoRA Serving
  • PD Disaggregation
  • EPD Disaggregation
  • Pipeline Parallelism for Long Context
  • Hierarchical KV Caching (HiCache)
  • Query VLM with Offline Engine
  • DP for Multi-Modal Encoder in SGLang
  • Cuda Graph for Multi-Modal Encoder in SGLang
  • SGLang Model Gateway
  • Deterministic Inference
  • Observability
  • Checkpoint Engine Integration
  • SGLang for RL Systems

Supported Models

  • Text Generation
  • Retrieval & Ranking
    • Embedding Models
    • Rerank Models
    • Classification API
  • Specialized Models
  • Extending SGLang

SGLang Diffusion

  • SGLang Diffusion
  • Install SGLang-Diffusion
  • Compatibility Matrix
  • SGLang diffusion CLI Inference
  • SGLang Diffusion OpenAI API
  • Performance Optimization
  • Attention Backends
  • Profiling Multimodal Generation
  • Caching Acceleration for Diffusion Models
  • Cache-DiT Acceleration
  • TeaCache Acceleration
  • How to Support New Diffusion Models
  • Contributing to SGLang Diffusion
  • Perf Baseline Generation Script
  • Caching Acceleration

Hardware Platforms

  • AMD GPUs
  • CPU Servers
  • TPU
  • NVIDIA Jetson Orin
  • Ascend NPUs
  • XPU

Developer Guide

  • Contribution Guide
  • Development Guide Using Docker
  • Development Guide for JIT Kernels
  • Benchmark and Profiling
  • Bench Serving Guide
  • Evaluating New Models with SGLang

References

  • Troubleshooting and Frequently Asked Questions
  • Environment Variables
  • Production Metrics
  • Production Request Tracing
  • Multi-Node Deployment
  • Custom Chat Template
  • Frontend Language
  • Post-Training Integration
  • Learn More and Join the Community
  • Repository
  • Show source
  • Suggest edit
  • Open issue
  • .rst

Retrieval & Ranking

Retrieval & Ranking#

Models for embeddings, reranking, and classification.

  • Embedding Models
  • Rerank Models
  • Classification API

previous

Diffusion Language Models

next

Embedding Models

By SGLang Team

© Copyright 2023-2026, SGLang.

Last updated on Mar 30, 2026.