跳转至主要内容
回到顶部
Ctrl
+
K
搜尋
Ctrl
+
K
開始使用
安裝 SGLang
基礎用法
發送請求
OpenAI 相容的 API
Ollama 相容的 API
離線引擎 API
SGLang 原生的 API
Sampling Parameters
Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)
進階功能
Server Arguments
Hyperparameter Tuning
Attention Backend
Speculative Decoding
Structured Outputs
Structured Outputs For Reasoning Models
Tool Parser
Reasoning Parser
量化
Quantized KV Cache
Expert Parallelism
DP, DPA and SGLang DP Router
LoRA Serving
PD Disaggregation
EPD Disaggregation
Pipeline Parallelism for Long Context
Hierarchical KV Caching (HiCache)
Query VLM with Offline Engine
DP for Multi-Modal Encoder in SGLang
Cuda Graph for Multi-Modal Encoder in SGLang
SGLang Model Gateway
Deterministic Inference
Observability
Checkpoint Engine Integration
SGLang for RL Systems
支援的模型
Text Generation
Retrieval & Ranking
Specialized Models
Extending SGLang
SGLang Diffusion
SGLang Diffusion
Install SGLang-Diffusion
Compatibility Matrix
SGLang diffusion CLI Inference
SGLang Diffusion OpenAI API
Performance Optimization
Attention Backends
Profiling Multimodal Generation
Caching Acceleration for Diffusion Models
Cache-DiT Acceleration
TeaCache Acceleration
How to Support New Diffusion Models
Contributing to SGLang Diffusion
Perf Baseline Generation Script
Caching Acceleration
硬體平臺
AMD GPUs
CPU Servers
TPU
NVIDIA Jetson Orin
Ascend NPUs
XPU
開發者指南
貢獻指南
Development Guide Using Docker
Development Guide for JIT Kernels
Benchmark and Profiling
Bench Serving Guide
Evaluating New Models with SGLang
參考
疑難排解與常見問題
環境變數
Production Metrics
Production Request Tracing
Multi-Node Deployment
Custom Chat Template
前端語言
Post-Training Integration
深入了解並加入社群
Repository
Open issue
索引