mlx.core.dequantize#
- dequantize(w: array, /, scales: array, biases: array | None = None, group_size: int | None = None, bits: int | None = None, mode: str = 'affine', global_scale: array | None = None, dtype: Dtype | None = None, *, stream: None | Stream | Device = None) array#
Dequantize the matrix
wusing quantization parameters.- 參數:
w (array) -- Matrix to be dequantized
scales (array) -- The scales to use per
group_sizeelements ofw.biases (array, optional) -- The biases to use per
group_sizeelements ofw. Default:None.group_size (int, optional) -- The size of the group in
wthat shares a scale and bias. See supported values and defaults in the table of quantization modes. Default:None.bits (int, optional) -- The number of bits occupied by each element of
win the quantized array. See supported values and defaults in the table of quantization modes. Default:None.global_scale (array, optional) -- The per-input float32 scale used for
"nvfp4"quantization if provided. Default:None.dtype (Dtype, optional) -- The data type of the dequantized output. If
Nonethe return type is inferred from the scales and biases when possible and otherwise defaults tobfloat16. Default:None.mode (str, optional) -- The quantization mode. Default:
"affine".
- 回傳:
The dequantized version of
w- 回傳型別:
備註
The currently supported quantization modes are
"affine","mxfp4,"mxfp8", and"nvfp4".For
affinequantization, given the notation inquantize(), we compute \(w_i\) from \(\hat{w_i}\) and corresponding \(s\) and \(\beta\) as follows\[w_i = s \hat{w_i} + \beta\]