tilelang.contrib.cutedsl.utils

Utility functions for CuTeDSL backend.

Provides common helpers used across the CuTeDSL codegen: bitcast, tensor construction, warp election, barrier sync, and FP16 packing.

Attributes

函式

bitcast(value, target_dtype)

Reinterpret the bits of a value as a different type.

make_filled_tensor(shape, value)

make_tensor_at_offset(ptr, offset, shape[, div_by])

shuffle_elect(thread_extent)

sync_thread_partial([barrier_id, thread_count])

pack_half2(x, y)

Pack two half-precision (fp16) values into a single 32-bit value.

Module Contents

tilelang.contrib.cutedsl.utils.BYTES_PER_TENSORMAP = 128
tilelang.contrib.cutedsl.utils.BYTES_PER_POINTER = 8
tilelang.contrib.cutedsl.utils.type_map
tilelang.contrib.cutedsl.utils.bitcast(value, target_dtype)

Reinterpret the bits of a value as a different type. Equivalent to C's (*(target_type *)(&value)).

參數:
  • value -- Source value (Numeric type from CuTeDSL)

  • target_dtype -- Target type (CuTeDSL type like Int8, Float16, etc.)

回傳:

Value reinterpreted as target type

tilelang.contrib.cutedsl.utils.make_filled_tensor(shape, value)
tilelang.contrib.cutedsl.utils.make_tensor_at_offset(ptr, offset, shape, div_by=1)
參數:

ptr (cutlass.cute.Pointer)

tilelang.contrib.cutedsl.utils.shuffle_elect(thread_extent)
tilelang.contrib.cutedsl.utils.sync_thread_partial(barrier_id=None, thread_count=None)
tilelang.contrib.cutedsl.utils.pack_half2(x, y)

Pack two half-precision (fp16) values into a single 32-bit value. Corresponds to CUDA's __pack_half2 intrinsic.

This packs two fp16 values into a single int32 by treating the fp16 bits as raw data and concatenating them.