tilelang.contrib.cutedsl.ldsm

LDMATRIX and STMATRIX operations for CuTeDSL backend. Based on tl_templates/cuda/ldsm.h

These functions provide wrappers around PTX ldmatrix/stmatrix instructions for loading/storing 8x8 matrix fragments between shared memory and registers.

Functions

ptx_ldmatrix_x1(smem_ptr, local_ptr, *[, loc, ip])

Load 1 matrix (8x8) from shared memory

ptx_ldmatrix_x2(smem_ptr, local_ptr, *[, loc, ip])

Load 2 matrices (8x8 each) from shared memory

ptx_ldmatrix_x4(smem_ptr, local_ptr, *[, loc, ip])

Load 4 matrices (8x8 each) from shared memory

ptx_ldmatrix_x1_trans(smem_ptr, local_ptr, *[, loc, ip])

Load 1 matrix (8x8) with transpose from shared memory

ptx_ldmatrix_x2_trans(smem_ptr, local_ptr, *[, loc, ip])

Load 2 matrices (8x8 each) with transpose from shared memory

ptx_ldmatrix_x4_trans(smem_ptr, local_ptr, *[, loc, ip])

Load 4 matrices (8x8 each) with transpose from shared memory

ptx_stmatrix_x1(smem_ptr, value0, *[, loc, ip])

Store 1 matrix (8x8) to shared memory

ptx_stmatrix_x2(smem_ptr, value0, value1, *[, loc, ip])

Store 2 matrices (8x8 each) to shared memory

ptx_stmatrix_x4(smem_ptr, value0, value1, value2, ...)

Store 4 matrices (8x8 each) to shared memory

ptx_stmatrix_x1_trans(smem_ptr, value0, *[, loc, ip])

Store 1 matrix (8x8) with transpose to shared memory

ptx_stmatrix_x2_trans(smem_ptr, value0, value1, *[, ...])

Store 2 matrices (8x8 each) with transpose to shared memory

ptx_stmatrix_x4_trans(smem_ptr, value0, value1, ...[, ...])

Store 4 matrices (8x8 each) with transpose to shared memory

Module Contents

tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x1(smem_ptr, local_ptr, *, loc=None, ip=None)

Load 1 matrix (8x8) from shared memory

参数:
  • smem_ptr (cutlass.cute.typing.Pointer)

  • local_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x2(smem_ptr, local_ptr, *, loc=None, ip=None)

Load 2 matrices (8x8 each) from shared memory

参数:
  • smem_ptr (cutlass.cute.typing.Pointer)

  • local_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x4(smem_ptr, local_ptr, *, loc=None, ip=None)

Load 4 matrices (8x8 each) from shared memory

参数:
  • smem_ptr (cutlass.cute.typing.Pointer)

  • local_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x1_trans(smem_ptr, local_ptr, *, loc=None, ip=None)

Load 1 matrix (8x8) with transpose from shared memory

参数:
  • smem_ptr (cutlass.cute.typing.Pointer)

  • local_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x2_trans(smem_ptr, local_ptr, *, loc=None, ip=None)

Load 2 matrices (8x8 each) with transpose from shared memory

参数:
  • smem_ptr (cutlass.cute.typing.Pointer)

  • local_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x4_trans(smem_ptr, local_ptr, *, loc=None, ip=None)

Load 4 matrices (8x8 each) with transpose from shared memory

参数:
  • smem_ptr (cutlass.cute.typing.Pointer)

  • local_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x1(smem_ptr, value0, *, loc=None, ip=None)

Store 1 matrix (8x8) to shared memory

参数:

smem_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x2(smem_ptr, value0, value1, *, loc=None, ip=None)

Store 2 matrices (8x8 each) to shared memory

参数:

smem_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x4(smem_ptr, value0, value1, value2, value3, *, loc=None, ip=None)

Store 4 matrices (8x8 each) to shared memory

参数:

smem_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x1_trans(smem_ptr, value0, *, loc=None, ip=None)

Store 1 matrix (8x8) with transpose to shared memory

参数:

smem_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x2_trans(smem_ptr, value0, value1, *, loc=None, ip=None)

Store 2 matrices (8x8 each) with transpose to shared memory

参数:

smem_ptr (cutlass.cute.typing.Pointer)

返回类型:

None

tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x4_trans(smem_ptr, value0, value1, value2, value3, *, loc=None, ip=None)

Store 4 matrices (8x8 each) with transpose to shared memory

参数:

smem_ptr (cutlass.cute.typing.Pointer)

返回类型:

None