mlx.nn.QQLinear#

class QQLinear(input_dims: int, output_dims: int, group_size: int = None, bits: int = None, mode: str = 'nvfp4')#

Quantizes the input and applies an affine transformation using quantized weights.

Two use cases are supported:

Eval: The weights are frozen and stored in quantized form together with their scales (self.weight is quantized and self.scales is provided).
Train: The weights are stored in higher precision and are quantized on
the fly during computation so that gradients with respect to the weights can be computed.

To switch between the two cases, use layer.eval() and layer.train() respectively.

Compared to the mlx.nn.QuantizedLinear layer, this layer quantizes the input as well and includes weights in gradient computations.

QQLinear also provides the class method from_linear() to convert mlx.nn.Linear layers to QQLinear layers.

Note: This layer does not support a bias term yet.

Parameters:

input_dims (int) – The dimensionality of the input features.
output_dims (int) – The dimensionality of the output features.
group_size (Optional[int]) – The group size to use for the quantized weight. See quantize(). Default: None.
bits (Optional[int]) – The bit width to use for the quantized weight. See quantize(). Default: None.
mode (Optional[str]) – The quantization method to use (see mlx.core.quantize()). Currently, only "nvfp4" and "mxfp8" are supported. Default: "nvfp4".

Methods

`dequantize`()
`from_linear`(linear_layer[, group_size, ...])	Create a `QQLinear` layer from a `Linear` layer.
`quantize`()

mlx.nn.QQLinear