mlx.core.dequantize#

Dequantize the matrix w using quantization parameters.

參數:

w (array) -- Matrix to be dequantized
scales (array) -- The scales to use per group_size elements of w.
biases (array, optional) -- The biases to use per group_size elements of w. Default: None.
group_size (int, optional) -- The size of the group in w that shares a scale and bias. See supported values and defaults in the table of quantization modes. Default: None.
bits (int, optional) -- The number of bits occupied by each element of w in the quantized array. See supported values and defaults in the table of quantization modes. Default: None.
global_scale (array, optional) -- The per-input float32 scale used for "nvfp4" quantization if provided. Default: None.
dtype (Dtype, optional) -- The data type of the dequantized output. If None the return type is inferred from the scales and biases when possible and otherwise defaults to bfloat16. Default: None.
mode (str, optional) -- The quantization mode. Default: "affine".

回傳:

The dequantized version of w

回傳型別:

array

備註

The currently supported quantization modes are "affine", "mxfp4, "mxfp8", and "nvfp4".

For affine quantization, given the notation in quantize(), we compute \(w_i\) from \(\hat{w_i}\) and corresponding \(s\) and \(\beta\) as follows

\[w_i = s \hat{w_i} + \beta\]

mlx.core.dequantize