跳转至主要内容
回到顶部
Ctrl
+
K
搜尋
Ctrl
+
K
開始使用
安裝 SGLang
基礎用法
發送請求
OpenAI 相容的 API
Ollama 相容的 API
離線引擎 API
SGLang 原生的 API
Sampling Parameters
Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)
進階功能
Server Arguments
Hyperparameter Tuning
Attention Backend
Speculative Decoding
結構化輸出
Structured Outputs For Reasoning Models
Tool Parser
Reasoning Parser
量化
Quantized KV Cache
Expert Parallelism
DP, DPA and SGLang DP Router
LoRA Serving
PD Disaggregation
EPD Disaggregation
Pipeline Parallelism for Long Context
Hierarchical KV Caching (HiCache)
Query VLM with Offline Engine
DP for Multi-Modal Encoder in SGLang
Cuda Graph for Multi-Modal Encoder in SGLang
Piecewise CUDA Graph
SGLang Model Gateway
Deterministic Inference
Observability
Checkpoint Engine Integration
SGLang for RL Systems
支援的模型
Text Generation
Retrieval & Ranking
Specialized Models
Extending SGLang
SGLang Diffusion
SGLang Diffusion
Install SGLang-Diffusion
Compatibility Matrix
SGLang Diffusion CLI
SGLang Diffusion OpenAI API
Attention Backends
Caching Acceleration
量化
Contributing to SGLang Diffusion
硬體平臺
AMD GPUs
CPU Servers
TPU
NVIDIA Jetson Orin
Ascend NPUs
XPU
開發者指南
貢獻指南
Development Guide Using Docker
Development Guide for JIT Kernels
Benchmark and Profiling
Bench Serving Guide
Evaluating New Models with SGLang
參考
疑難排解與常見問題
環境變數
Production Metrics
Production Request Tracing
Multi-Node Deployment
Custom Chat Template
前端語言
Post-Training Integration
Release Lookup
深入了解並加入社群
Repository
Open issue
索引