tilelang.language.allocate ========================== .. py:module:: tilelang.language.allocate .. autoapi-nested-parse:: Memory allocation utilities for Tile-AI programs. This module provides a set of functions for allocating different types of memory buffers in Tile-AI programs. It wraps TVM's buffer allocation functionality with convenient interfaces for different memory scopes. Available allocation functions: - alloc_shared: Allocates shared memory buffers for inter-thread communication - alloc_local: Allocates local memory buffers for thread-private storage - alloc_fragment: Allocates fragment memory buffers for specialized operations - alloc_var: Allocates single-element variable buffers Each function takes shape and dtype parameters and returns a TVM buffer object with the appropriate memory scope. Attributes ---------- .. autoapisummary:: tilelang.language.allocate.DescKind Functions --------- .. autoapisummary:: tilelang.language.allocate.alloc_shared tilelang.language.allocate.alloc_local tilelang.language.allocate.alloc_fragment tilelang.language.allocate.alloc_var tilelang.language.allocate.alloc_barrier tilelang.language.allocate.alloc_tmem tilelang.language.allocate.alloc_reducer tilelang.language.allocate.alloc_descriptor tilelang.language.allocate.alloc_wgmma_desc tilelang.language.allocate.alloc_tcgen05_smem_desc tilelang.language.allocate.alloc_tcgen05_instruction_desc tilelang.language.allocate.alloc_tcgen05_instr_desc tilelang.language.allocate.empty Module Contents --------------- .. py:function:: alloc_shared(shape, dtype, scope='shared.dyn') Allocate a shared memory buffer for inter-thread communication. :param shape: The shape of the buffer to allocate :type shape: tuple :param dtype: The data type of the buffer (e.g., 'float32', 'int32') :type dtype: str :param scope: The memory scope. Defaults to "shared.dyn" :type scope: str, optional :returns: A TVM buffer object allocated in shared memory :rtype: T.Buffer .. py:function:: alloc_local(shape, dtype, scope='local') Allocate a local memory buffer for thread-private storage. :param shape: The shape of the buffer to allocate :type shape: tuple :param dtype: The data type of the buffer (e.g., 'float32', 'int32') :type dtype: str :param scope: The memory scope. Defaults to "local" :type scope: str, optional :returns: A TVM buffer object allocated in local memory :rtype: T.Buffer .. py:function:: alloc_fragment(shape, dtype, scope='local.fragment') Allocate a fragment memory buffer for specialized operations. :param shape: The shape of the buffer to allocate :type shape: tuple :param dtype: The data type of the buffer (e.g., 'float32', 'int32') :type dtype: str :param scope: The memory scope. Defaults to "local.fragment" :type scope: str, optional :returns: A TVM buffer object allocated in fragment memory :rtype: T.Buffer .. py:function:: alloc_var(dtype: str, init: tvm.tir.PrimExpr | int | float, scope: str = 'local.var') -> tvm.tir.buffer.Buffer alloc_var(dtype: str, scope: str = 'local.var', *, init: tvm.tir.PrimExpr | int | float | None = None) -> tvm.tir.buffer.Buffer Allocate a single-element variable buffer. :param dtype: The data type of the buffer (e.g., 'float32', 'int32') :type dtype: str :param \*args: Optional positional arguments. A single positional string is treated as the scope for backward compatibility. A single non-string positional argument (or keyword ``init``) specifies the initializer. When two positional arguments are provided, they are interpreted as ``(init, scope)``. :param scope: The memory scope. Defaults to "local.var". Use as keyword argument for clarity when also providing an initializer. :type scope: str, optional :param init: The optional initializer value. When provided, the generated code will initialize the variable with this value instead of defaulting to zero. :type init: PrimExpr, optional .. rubric:: Examples a = T.alloc_var('int32', 1) # var with init 1 a = T.alloc_var('int32', 'local.var') # var with local.var scope a = T.alloc_var('int32', 1, 'local.var') # var with init 1 and local.var scope a = T.alloc_var('int32', 'local.var', init=1) # var with init 1 and local.var scope a = T.alloc_var('int32', init=1) # var with init 1 and local.var scope :returns: A TVM buffer object allocated as a single-element variable :rtype: T.Buffer .. py:function:: alloc_barrier(arrive_count) Allocate a barrier buffer. :param arrive_count: The number of threads that need to arrive at the barrier :type arrive_count: int :returns: A TVM buffer object allocated as a barrier :rtype: T.Buffer .. py:function:: alloc_tmem(shape, dtype) Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA). TMEM is a dedicated on-chip memory introduced in Hopper GPUs, designed to reduce register pressure and enable asynchronous, single-threaded MMA operations. It is organized as a 2D array of 512 columns by 128 rows (lanes), with each cell being 32 bits. Allocation is performed in units of columns, and every lane of a column is allocated together. Key properties and requirements: - The number of columns allocated must be a power of 2 and at least 32. - TMEM allocations are dynamic and must be explicitly deallocated. - Both allocation and deallocation must be performed by the same warp. - The base address of the TMEM allocation is stored in shared memory and used as the offset for TCGEN5.MMA accumulator tensors. - Only TCGEN5.MMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved. - The number of columns allocated should not increase between any two allocations in the execution order within the CTA. :param num_cols: Number of columns to allocate in TMEM. Must be a power of 2 and >= 32 but less than or equal to 512. :type num_cols: int :returns: A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in TCGEN5.MMA operations. :rtype: T.Buffer .. note:: - TMEM is only available on supported architectures (e.g., Hopper and later). - The buffer returned should be used according to TMEM access restrictions and deallocated appropriately. .. py:function:: alloc_reducer(shape, dtype, op='sum', replication=None) Allocate a reducer buffer. Modifications needs to conform with `op`, such as `op="sum"` requires `reducer[...] += ...` and `op="max"` requires `reducer[...] = T.max(reducer[...], ...)`. Only after T.fill with proper initializer the reduction may begin; only after T.finalize_reducer the partial results will be available. For `op="sum"`, filled value must be 0; for min and max, the filled initializer will become max or min clamper correspondingly. You may want to use `T.max_value` for min and `T.min_value` for max. :param shape: The shape of the buffer to allocate :type shape: tuple :param dtype: The data type of the buffer (e.g., 'float32', 'int32') :type dtype: str :param op: The reduce operation corresponded with the reducer :type op: str :param replication: Replication strategy, can be "all" or "none". Defaults to not specified, and the compiler will do whatever it want. :type replication: str | None :returns: A TVM buffer object allocated in thread-private storage, available to reduce values in T.Parallel loops. :rtype: T.Buffer .. py:data:: DescKind .. py:function:: alloc_descriptor(kind = 'wgmma', dtype = 'uint64') Allocate a descriptor buffer for WGMMA and TCGEN5.MMA. :param kind: The descriptor kind, one of "wgmma", "tcgen05" ("utcmma" as alias). :returns: A TVM buffer object allocated as a descriptor :rtype: T.Buffer .. py:function:: alloc_wgmma_desc(dtype = 'uint64') .. py:function:: alloc_tcgen05_smem_desc(dtype = 'uint64') .. py:function:: alloc_tcgen05_instruction_desc(dtype = 'uint32') .. py:function:: alloc_tcgen05_instr_desc(dtype = 'uint32') .. py:function:: empty(shape, dtype = 'float32')