跳转至主要内容
Ctrl+K
SGLang - Home SGLang - Home

開始使用

  • 安裝 SGLang

基礎用法

  • 發送請求
  • OpenAI 相容的 API
  • Ollama 相容的 API
  • 離線引擎 API
  • SGLang 原生的 API
  • Sampling Parameters
  • Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)

進階功能

  • Server Arguments
  • Hyperparameter Tuning
  • Attention Backend
  • Speculative Decoding
  • Structured Outputs
  • Structured Outputs For Reasoning Models
  • Tool Parser
  • Reasoning Parser
  • 量化
  • Quantized KV Cache
  • Expert Parallelism
  • DP, DPA and SGLang DP Router
  • LoRA Serving
  • PD Disaggregation
  • EPD Disaggregation
  • Pipeline Parallelism for Long Context
  • Hierarchical KV Caching (HiCache)
  • Query VLM with Offline Engine
  • DP for Multi-Modal Encoder in SGLang
  • Cuda Graph for Multi-Modal Encoder in SGLang
  • SGLang Model Gateway
  • Deterministic Inference
  • Observability
  • Checkpoint Engine Integration
  • SGLang for RL Systems

支援的模型

  • Text Generation
  • Retrieval & Ranking
  • Specialized Models
  • Extending SGLang

SGLang Diffusion

  • SGLang Diffusion
  • Install SGLang-Diffusion
  • Compatibility Matrix
  • SGLang diffusion CLI Inference
  • SGLang Diffusion OpenAI API
  • Performance Optimization
  • Attention Backends
  • Profiling Multimodal Generation
  • Caching Acceleration for Diffusion Models
  • Cache-DiT Acceleration
  • TeaCache Acceleration
  • How to Support New Diffusion Models
  • Contributing to SGLang Diffusion
  • Perf Baseline Generation Script
  • Caching Acceleration

硬體平臺

  • AMD GPUs
  • CPU Servers
  • TPU
  • NVIDIA Jetson Orin
  • Ascend NPUs
  • XPU

開發者指南

  • 貢獻指南
  • Development Guide Using Docker
  • Development Guide for JIT Kernels
  • Benchmark and Profiling
  • Bench Serving Guide
  • Evaluating New Models with SGLang

參考

  • 疑難排解與常見問題
  • 環境變數
  • Production Metrics
  • Production Request Tracing
  • Multi-Node Deployment
    • Multi-Node Deployment
    • Deploy On Kubernetes
    • LWS Based PD Deploy
    • DeepSeekV32-Exp RBG Based PD Deploy
  • Custom Chat Template
  • 前端語言
  • Post-Training Integration
  • 深入了解並加入社群
  • Repository
  • Show source
  • Suggest edit
  • Open issue
  • .rst

Multi-Node Deployment

Multi-Node Deployment#

Multi-Node Deployment

  • Multi-Node Deployment
  • Deploy On Kubernetes
  • LWS Based PD Deploy
  • DeepSeekV32-Exp RBG Based PD Deploy
  • Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs

  • Deploying Kimi K2 with PD Disaggregation and Large-Scale Expert Parallelism on 128 H200 GPUs

上一頁

Production Request Tracing

下一頁

Multi-Node Deployment

作者: SGLang Team

© Copyright 2023-2026, SGLang.

最後更新於 2026 年 03 月 30 日。