跳转至主要内容

Ctrl+K

開始使用

安裝 SGLang

基礎用法

發送請求
OpenAI 相容的 API
Ollama 相容的 API
離線引擎 API
SGLang 原生的 API
Sampling Parameters
Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)

進階功能

Server Arguments
Hyperparameter Tuning
Attention Backend
Speculative Decoding
結構化輸出
Structured Outputs For Reasoning Models
Tool Parser
Reasoning Parser
量化
Quantized KV Cache
Expert Parallelism
DP, DPA and SGLang DP Router
LoRA Serving
PD Disaggregation
EPD Disaggregation
Pipeline Parallelism for Long Context
Hierarchical KV Caching (HiCache)
Query VLM with Offline Engine
DP for Multi-Modal Encoder in SGLang
Cuda Graph for Multi-Modal Encoder in SGLang
Piecewise CUDA Graph
SGLang Model Gateway
Deterministic Inference
Observability
Checkpoint Engine Integration
SGLang for RL Systems

支援的模型

Text Generation
Retrieval & Ranking
Specialized Models
Extending SGLang

SGLang Diffusion

SGLang Diffusion
Install SGLang-Diffusion
Compatibility Matrix
SGLang Diffusion CLI
SGLang Diffusion OpenAI API
Attention Backends
Caching Acceleration
量化
Contributing to SGLang Diffusion

硬體平臺

AMD GPUs
CPU Servers
TPU
NVIDIA Jetson Orin
Ascend NPUs
XPU

開發者指南

貢獻指南
Development Guide Using Docker
Development Guide for JIT Kernels
Benchmark and Profiling
Bench Serving Guide
Evaluating New Models with SGLang

參考

疑難排解與常見問題
環境變數
Production Metrics
Production Request Tracing
Multi-Node Deployment
Custom Chat Template
前端語言
Post-Training Integration
Release Lookup
深入了解並加入社群

Repository
Open issue

索引

作者： SGLang Team

© Copyright 2023-2026, SGLang.

最後更新於 2026 年 05 月 11 日。