mlx.nn.QQLinear#
- class QQLinear(input_dims: int, output_dims: int, group_size: int = None, bits: int = None, mode: str = 'nvfp4')#
Quantizes the input and applies an affine transformation using quantized weights.
Two use cases are supported:
Eval: The weights are frozen and stored in quantized form together with their scales (
self.weightis quantized andself.scalesis provided).- Train: The weights are stored in higher precision and are quantized on
the fly during computation so that gradients with respect to the weights can be computed.
To switch between the two cases, use
layer.eval()andlayer.train()respectively.Compared to the
mlx.nn.QuantizedLinearlayer, this layer quantizes the input as well and includes weights in gradient computations.QQLinearalso provides the class methodfrom_linear()to convertmlx.nn.Linearlayers toQQLinearlayers.Note: This layer does not support a bias term yet.
- Parameters:
input_dims (int) – The dimensionality of the input features.
output_dims (int) – The dimensionality of the output features.
group_size (Optional[int]) – The group size to use for the quantized weight. See
quantize(). Default:None.bits (Optional[int]) – The bit width to use for the quantized weight. See
quantize(). Default:None.mode (Optional[str]) – The quantization method to use (see
mlx.core.quantize()). Currently, only"nvfp4"and"mxfp8"are supported. Default:"nvfp4".
Methods