Support Features on Ascend NPU#
This section describes the basic functions and features supported by the Ascend NPU.If you encounter issues or have any questions, please open an issue.
If you want to know the meaning and usage of each parameter, click Service Arguments.
Model and tokenizer#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
|
√ |
√ |
|
{} |
Type: str |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
|
√ |
√ |
HTTP server#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
FALSE |
bool flag |
× |
× |
Quantization and data type#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: str |
× |
× |
|
|
|
√ |
√ |
|
FALSE |
bool flag |
× |
× |
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
Memory and scheduling#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Optional[float] |
× |
× |
|
|
Type: float |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
× |
× |
Runtime options#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
bool flag (set to enable) |
× |
× |
|
|
Type: float |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
× |
× |
Logging#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
|
√ |
√ |
|
text |
text, json |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
List[str] |
× |
× |
|
|
List[float] |
× |
× |
|
|
List[float] |
× |
× |
|
|
List[float] |
× |
× |
|
|
bool flag |
× |
× |
|
|
List[str] |
× |
× |
|
|
List[str] |
× |
× |
|
|
Type: float |
× |
× |
|
|
Type: int |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
RequestMetricsExporter configuration#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
FALSE |
× |
× |
|
|
× |
× |
Data parallelism#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
bool flag |
√ |
√ |
Multi-node distributed serving#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
Model override args#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: str |
√ |
√ |
LoRA#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Bool flag |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: List[str] / |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
× |
× |
Kernel Backends (Attention, Sampling, Grammar, GEMM)#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
× |
× |
|
|
|
× |
× |
|
|
|
× |
× |
|
|
bool flag |
× |
× |
Speculative decoding#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
Type: str |
× |
× |
|
× |
× |
||
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
|
√ |
√ |
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
√ |
√ |
||
|
√ |
√ |
Ngram speculative decoding#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
|
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
Expert parallelism#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
× |
√ |
|
|
|
× |
√ |
|
|
|
× |
√ |
|
|
|
× |
× |
|
|
bool flag |
× |
× |
|
|
|
× |
√ |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: float |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: int |
√ |
√ |
|
None |
N/A |
× |
× |
|
None |
N/A |
× |
× |
Mamba Cache#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
× |
× |
|
|
|
× |
× |
|
|
Type: float |
× |
× |
|
|
|
× |
× |
|
|
Type: int |
× |
× |
Hierarchical cache#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
× |
× |
|
|
Type: str |
× |
× |
LMCache#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
Ktransformer server args#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
Double Sparsity#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
Offloading#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
√ |
√ |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
Args for multi-item scoring#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
× |
× |
Optimization/debug options#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
√ |
√ |
|
|
Type: int |
× |
√ |
|
|
List[int] |
× |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: float |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
Type: JSON |
× |
× |
|
|
[“eager”, “inductor”] |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
√ |
|
`` |
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
Type: int |
× |
× |
|
|
List[int] |
× |
× |
|
|
|
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
× |
× |
Dynamic batch tokenizer#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: float |
√ |
√ |
Debug tensor dumps#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
× |
× |
|
× |
× |
||
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
PD disaggregation#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
Encode prefill disaggregation#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
Custom weight loader#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
List[str] |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: JSON |
× |
× |
|
× |
× |
||
|
× |
× |
For PD-Multiplexing#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
For Multi-Modal#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
× |
× |
||
|
|
Type: JSON / Dict |
√ |
√ |
For checkpoint decryption#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
× |
× |
||
|
× |
× |
||
|
× |
× |
For deterministic inference#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
For registering hooks#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: JSON list |
× |
× |
Configuration file support#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
yaml |
× |
× |