tilelang.intrinsics.wgmma_macro_generator¶

Attributes¶

lift

Classes¶

`SwizzleMode`	Enum where members are also (and must be) ints
`TensorCoreIntrinEmitter`	To eliminate Python syntax within TIR Macro.

Module Contents¶

tilelang.intrinsics.wgmma_macro_generator.lift¶

class tilelang.intrinsics.wgmma_macro_generator.SwizzleMode¶

Bases: enum.IntEnum

Enum where members are also (and must be) ints

NONE = 0¶

SWIZZLE_128B = 1¶

SWIZZLE_64B = 2¶

SWIZZLE_32B = 3¶

is_none()¶

返回类型:: bool

is_swizzle_32b()¶

返回类型:: bool

is_swizzle_64b()¶

返回类型:: bool

is_swizzle_128b()¶

返回类型:: bool

swizzle_byte_size()¶

返回类型:: int

swizzle_atom_size()¶

返回类型:: int

class tilelang.intrinsics.wgmma_macro_generator.TensorCoreIntrinEmitter(a_dtype=T.float16, b_dtype=T.float16, accum_dtype=T.float16, a_transposed=False, b_transposed=False, block_row_warps=2, block_col_warps=2, warp_row_tiles=8, warp_col_tiles=8, chunk=16, reduce_k=1, num_elems_per_byte=1, is_m_first=False, thread_var=None)¶

Bases: tilelang.intrinsics.mma_macro_generator.TensorCoreIntrinEmitter

To eliminate Python syntax within TIR Macro.

参数:

a_dtype (str)
b_dtype (str)
accum_dtype (str)
a_transposed (bool)
b_transposed (bool)
block_row_warps (int)
block_col_warps (int)
warp_row_tiles (int)
warp_col_tiles (int)
chunk (int)
reduce_k (int)
num_elems_per_byte (int)
is_m_first (bool | None)
thread_var (tvm.tir.Var | None)

wgmma_prefix: str¶

wgmma_inst_m: int¶

wgmma_inst_n: int¶

a_shared_layout: tilelang.layout.Layout = None¶

b_shared_layout: tilelang.layout.Layout = None¶

wgmma(A_region, B_region, C_region, clear_accum=False, wg_wait=0)¶

参数:

A_region (tvm.tir.BufferRegion)
B_region (tvm.tir.BufferRegion)
C_region (tvm.tir.BufferRegion)
clear_accum (tvm.tir.PrimExpr)
wg_wait (int)

wgmma_rs(A_region, B_region, C_region, clear_accum=False, wg_wait=0)¶

参数:

A_region (tvm.tir.BufferRegion)
B_region (tvm.tir.BufferRegion)
C_region (tvm.tir.BufferRegion)
clear_accum (tvm.tir.PrimExpr)
wg_wait (int)

make_mma_load_layout(local_buf, matrix='A')¶

Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.

参数:

local_buf (tir.Buffer) -- The local buffer representing a fragment of a matrix.
matrix (str)

返回:

A fragment object that describes how threads and indices in local_buf are laid out.

返回类型:

T.Fragment

抛出:

AssertionError -- If local_buf is not detected to be a fragment buffer.

make_mma_store_layout(local_buf)¶

Create a layout function for storing MMA results into a fragment buffer. This layout is used in conjunction with inverse_mma_store_layout to map fragment indices to threads and local indices.

参数:: local_buf (tir.Buffer) -- The local buffer representing a fragment of a matrix.
返回:: A fragment object that describes how threads and indices in local_buf are laid out.
返回类型:: T.Fragment
抛出:: AssertionError -- If local_buf is not detected to be a fragment buffer.