tilelang.language.math_intrinsics ================================= .. py:module:: tilelang.language.math_intrinsics .. autoapi-nested-parse:: Common math intrinsics exposed on the TileLang language surface. Functions --------- .. autoapisummary:: tilelang.language.math_intrinsics.ieee_add tilelang.language.math_intrinsics.ieee_sub tilelang.language.math_intrinsics.ieee_mul tilelang.language.math_intrinsics.ieee_fmaf tilelang.language.math_intrinsics.ieee_frcp tilelang.language.math_intrinsics.ieee_fsqrt tilelang.language.math_intrinsics.ieee_frsqrt tilelang.language.math_intrinsics.ieee_fdiv tilelang.language.math_intrinsics.fadd2 tilelang.language.math_intrinsics.fmul2 tilelang.language.math_intrinsics.fma2 Module Contents --------------- .. py:function:: ieee_add(x, y, rounding_mode='rn') IEEE-compliant addition with specified rounding mode :param x: First operand. :type x: PrimExpr :param y: Second operand. :type y: PrimExpr :param rounding_mode: Rounding mode: 'rn' (round to nearest), 'rz' (round toward zero), 'ru' (round toward positive infinity), 'rd' (round toward negative infinity). Default is 'rn'. :type rounding_mode: str, optional :returns: **result** -- The result. :rtype: PrimExpr .. py:function:: ieee_sub(x, y, rounding_mode='rn') IEEE-compliant subtraction with specified rounding mode :param x: First operand. :type x: PrimExpr :param y: Second operand. :type y: PrimExpr :param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'. :type rounding_mode: str, optional :returns: **result** -- The result. :rtype: PrimExpr .. py:function:: ieee_mul(x, y, rounding_mode='rn') IEEE-compliant multiplication with specified rounding mode :param x: First operand. :type x: PrimExpr :param y: Second operand. :type y: PrimExpr :param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'. :type rounding_mode: str, optional :returns: **result** -- The result. :rtype: PrimExpr .. py:function:: ieee_fmaf(x, y, z, rounding_mode='rn') IEEE-compliant fused multiply-add with specified rounding mode :param x: First operand. :type x: PrimExpr :param y: Second operand. :type y: PrimExpr :param z: Third operand (addend). :type z: PrimExpr :param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'. :type rounding_mode: str, optional :returns: **result** -- The result of x * y + z. :rtype: PrimExpr .. py:function:: ieee_frcp(x, rounding_mode='rn') IEEE-compliant reciprocal with specified rounding mode :param x: Input operand. :type x: PrimExpr :param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'. :type rounding_mode: str, optional :returns: **result** -- The result of 1/x. :rtype: PrimExpr .. py:function:: ieee_fsqrt(x, rounding_mode='rn') IEEE-compliant square root with specified rounding mode :param x: Input operand. :type x: PrimExpr :param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'. :type rounding_mode: str, optional :returns: **result** -- The result of sqrt(x). :rtype: PrimExpr .. py:function:: ieee_frsqrt(x) IEEE-compliant reciprocal square root (round to nearest only) :param x: Input operand. :type x: PrimExpr :returns: **result** -- The result of 1/sqrt(x). :rtype: PrimExpr .. py:function:: ieee_fdiv(x, y, rounding_mode='rn') IEEE-compliant division with specified rounding mode :param x: Dividend. :type x: PrimExpr :param y: Divisor. :type y: PrimExpr :param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'. :type rounding_mode: str, optional :returns: **result** -- The result of x/y. :rtype: PrimExpr .. py:function:: fadd2(x, y) Packed FP32x2 add. Lowers to PTX `add.rn.f32x2` on supported NVIDIA architectures/toolchains, and falls back to per-lane scalar operations otherwise. :param x: First operand. Must be dtype ``float32x2``. :type x: PrimExpr :param y: Second operand. Must be dtype ``float32x2``. :type y: PrimExpr :returns: **result** -- A ``float32x2`` result. :rtype: PrimExpr .. py:function:: fmul2(x, y) Packed FP32x2 multiply. Lowers to PTX `mul.rn.f32x2` on supported NVIDIA architectures/toolchains, and falls back to per-lane scalar operations otherwise. .. py:function:: fma2(x, y, z) Packed FP32x2 fused multiply-add (x * y + z). Lowers to PTX `fma.rn.f32x2` on supported NVIDIA architectures/toolchains, and falls back to per-lane scalar operations otherwise.