tilelang.intrinsics.tcgen05_macro_generator

Attributes

類別

SwizzleMode

Enum where members are also (and must be) ints

TensorCoreIntrinEmitter

Intrinsic emitter for Blackwell (SM100) TCGEN5MMA instructions.

Module Contents

tilelang.intrinsics.tcgen05_macro_generator.lift
class tilelang.intrinsics.tcgen05_macro_generator.SwizzleMode

Bases: enum.IntEnum

Enum where members are also (and must be) ints

NONE = 0
SWIZZLE_128B = 2
SWIZZLE_64B = 4
SWIZZLE_32B = 6
is_none()
回傳型別:

bool

is_swizzle_32b()
回傳型別:

bool

is_swizzle_64b()
回傳型別:

bool

is_swizzle_128b()
回傳型別:

bool

swizzle_byte_size()
回傳型別:

int

swizzle_atom_size()
回傳型別:

int

class tilelang.intrinsics.tcgen05_macro_generator.TensorCoreIntrinEmitter(a_dtype=T.float16, b_dtype=T.float16, accum_dtype=T.float16, a_transposed=False, b_transposed=False, block_row_warps=2, block_col_warps=2, warp_row_tiles=8, warp_col_tiles=8, chunk=16, reduce_k=1, num_elems_per_byte=1, is_m_first=False, thread_var=None)

Bases: tilelang.intrinsics.mma_macro_generator.TensorCoreIntrinEmitter

Intrinsic emitter for Blackwell (SM100) TCGEN5MMA instructions.

Generates TIR macros that lower to tcgen05.mma PTX instructions for both the SS (Shared-Shared) and TS (TensorMemory-Shared) GEMM variants. Also provides layout helpers for tensor-memory (TMEM) buffers.

參數:
  • a_dtype (str)

  • b_dtype (str)

  • accum_dtype (str)

  • a_transposed (bool)

  • b_transposed (bool)

  • block_row_warps (int)

  • block_col_warps (int)

  • warp_row_tiles (int)

  • warp_col_tiles (int)

  • chunk (int)

  • reduce_k (int)

  • num_elems_per_byte (int)

  • is_m_first (bool)

  • thread_var (tvm.tir.Var | None)

tcgen05_prefix: str
a_shared_layout: tilelang.layout.Layout = None
b_shared_layout: tilelang.layout.Layout = None
tcgen05mma(A_buf, B_buf, C_local_buf, mbar, clear_accum=False)

Emit a TCGEN5MMA operation, dispatching to SS or TS variant based on A's memory scope.

If A_buf resides in tensor memory (shared.tmem), the TS variant is emitted; otherwise the SS variant is used (both A and B from shared memory).

參數:
  • A_buf (Buffer) -- Operand A — either in shared memory (SS) or tensor memory (TS).

  • B_buf (Buffer) -- Operand B in shared memory.

  • C_local_buf (Buffer) -- Accumulator buffer in tensor memory.

  • mbar (PrimExpr) -- Memory barrier used for MMA completion signalling.

  • clear_accum (PrimExpr) -- Whether to zero the accumulator before the first MMA.

tcgen05mma_ts(A_buf, B_buf, C_local_buf, mbar, clear_accum=False)

Emit the TS (TensorMemory-Shared) variant of TCGEN5MMA.

Reads operand A directly from tensor memory (TMEM) and operand B from shared memory via a descriptor. The TMEM column offset for A is computed assuming packed storage (e.g. two bfloat16 values per uint32 column) to match the output of tcgen05.st.

參數:
  • A_buf (Buffer) -- Operand A residing in tensor memory (shared.tmem).

  • B_buf (Buffer) -- Operand B in shared memory.

  • C_local_buf (Buffer) -- Accumulator buffer in tensor memory.

  • mbar (PrimExpr) -- Memory barrier for MMA completion signalling.

  • clear_accum (PrimExpr) -- Whether to zero the accumulator before the first MMA.

abstractmethod make_mma_load_layout(local_buf, matrix='A')
參數:
  • local_buf (tvm.tir.Buffer)

  • matrix (str)

回傳型別:

tilelang.language.Fragment

make_mma_store_layout(tmem_buf)

Create the TCGEN5 tensor-memory layout used to store MMA accumulators.

參數:

tmem_buf (tir.Buffer) -- The local buffer representing tensormemory of a mma's output

回傳:

Layout object describing how logical (i, j) coordinates map to the swizzled tensor-memory offsets required by TCGEN5MMA.

回傳型別:

Layout

引發:

AssertionError -- If tmem_buf is not detected to be a tensor-memory buffer.

get_tcgen5_mma_meta(m, n, k)

Query the FFI for TCGEN5MMA atom metadata (atom_m, atom_n, atom_k, enable_ws, enable_2cta).

參數:
  • m (int)

  • n (int)

  • k (int)

get_tcgen5_instr_desc(atom_m, atom_n, atom_k, a_is_k_major, b_is_k_major, scale_in_a, scale_in_b)

Build the 64-bit instruction descriptor for a tcgen05.mma PTX call.

參數:
  • atom_m (int)

  • atom_n (int)

  • atom_k (int)

  • a_is_k_major (bool)

  • b_is_k_major (bool)

  • scale_in_a (int)

  • scale_in_b (int)

回傳型別:

tvm.tir.PrimExpr