tilelang.language.kernel

Kernel launching language interface in TileLang.

類別

FrameStack

A simple stack-like wrapper around a deque that provides

KernelLaunchFrame

KernelLaunchFrame is a custom TIRFrame that manages block/thread indices

函式

Kernel(*blocks[, threads, cluster_dims, is_cpu, prelude])

Tools to quickly construct a GPU kernel launch frame.

CUDASourceCodeKernel(*blocks[, threads, entry_name, ...])

Launch a kernel from CUDA source code or a CUDA source file.

get_thread_binding([dim])

Returns the thread binding for the given dimension.

get_thread_bindings()

Returns all three thread bindings.

get_block_binding([dim])

Returns the block binding for the given dimension.

get_block_bindings()

Returns all three block bindings.

get_thread_extent([dim])

Returns the thread extent for the given dimension.

get_thread_extents()

Returns all three thread extents.

get_block_extent([dim])

Returns the block extent for the given dimension.

get_block_extents()

Returns all three block extents.

Module Contents

class tilelang.language.kernel.FrameStack

A simple stack-like wrapper around a deque that provides push, pop, and top methods for convenience.

push(item)

Pushes an item onto the top of the stack.

pop()

Pops and returns the top of the stack, or returns None if the stack is empty.

top()

Returns the item on the top of the stack without removing it, or None if the stack is empty.

size()

Returns the number of items in the stack.

__len__()

Returns the number of items in the stack.

__bool__()

Allows truthy checks on the stack object itself, e.g., 'if stack: ...'

class tilelang.language.kernel.KernelLaunchFrame

基底類別:tvm.script.ir_builder.tir.frame.TIRFrame

KernelLaunchFrame is a custom TIRFrame that manages block/thread indices and handles the entry and exit of the kernel launch scope.

__enter__()

Enters the KernelLaunchFrame scope and pushes this frame onto the stack. Returns one Var if we detect exactly 5 frames (meaning there is a single block dimension), or a list of Vars otherwise.

回傳值型別:

tvm.tir.Var | list[tvm.tir.Var]

__exit__(ptype, value, trace)

Exits the KernelLaunchFrame scope and pops this frame from the stack, but only if it's indeed the topmost frame.

classmethod Current()

Returns the topmost (current) KernelLaunchFrame from the stack if it exists, or None if the stack is empty.

回傳值型別:

KernelLaunchFrame | None

get_block_extent(dim)

Returns the block extent for the given dimension. dim=0 corresponds to blockIdx.x, dim=1 to blockIdx.y, and dim=2 to blockIdx.z.

參數:

dim (int)

回傳值型別:

int

get_block_extents()

Returns the block extents for all three dimensions.

回傳值型別:

list[int]

get_thread_extent(dim)

Returns the thread extent for the given dimension. dim=0 corresponds to threadIdx.x, dim=1 to threadIdx.y, and dim=2 to threadIdx.z.

參數:

dim (int)

回傳值型別:

int

get_thread_extents()

Returns the thread extents for all three dimensions.

回傳值型別:

list[int]

get_thread_binding(dim=0)

Returns the thread binding for the given dimension. dim=0 corresponds to threadIdx.x, dim=1 to threadIdx.y, and dim=2 to threadIdx.z.

參數:

dim (int)

回傳值型別:

tvm.tir.Var

get_thread_bindings()

Returns the thread binding for the given dimension. dim=0 corresponds to threadIdx.x, dim=1 to threadIdx.y, and dim=2 to threadIdx.z.

回傳值型別:

list[tvm.tir.Var]

get_num_threads()

Returns the thread indices from the topmost frame.

回傳值型別:

int

get_block_binding(dim=0)

Returns the block binding for the given dimension. dim=0 corresponds to blockIdx.x, dim=1 to blockIdx.y, and dim=2 to blockIdx.z.

參數:

dim (int)

回傳值型別:

tvm.tir.Var

get_block_bindings()

Returns all three block bindings.

回傳值型別:

list[tvm.tir.Var]

property blocks: list[tvm.tir.Var]

Returns the block indices from the topmost frame.

回傳值型別:

list[tvm.tir.Var]

property threads: list[tvm.tir.Var]

Returns the thread indices from the topmost frame.

回傳值型別:

list[tvm.tir.Var]

property num_threads: int

Returns the total number of threads.

回傳值型別:

int

tilelang.language.kernel.Kernel(*blocks, threads=None, cluster_dims=None, is_cpu=False, prelude=None)

Tools to quickly construct a GPU kernel launch frame.

參數:
  • blocks (int) -- A list of extent, can be 1-3 dimension, representing gridDim.(x|y|z)

  • threads (int) -- A integer representing blockDim.x Or a list of integers representing blockDim.(x|y|z) if the value is -1, we skip the threadIdx.x binding.

  • cluster_dims (int | tuple[int, int, int] | list[int] | None) -- The cluster dimensions for SM90+ cluster launch. For example, use 2 or (2, 1, 1) to create 2-CTA clusters. When specified, the kernel will be launched using cudaLaunchKernelEx with cudaLaunchAttributeClusterDimension.

  • prelude (str) -- The import c code of the kernel, will be injected before the generated kernel code.

  • is_cpu (bool)

回傳:

res -- The result LaunchThreadFrame.

回傳值型別:

Tuple[frame.LaunchThreadFrame]

範例

Create a 1-D CUDA kernel launch and unpack the single block index:

with T.Kernel(T.ceildiv(N, 128), threads=128) as bx:
    # bx is the blockIdx.x binding (also iterable as (bx,))
    ...

Launch a 2-D grid while requesting two thread dimensions:

with T.Kernel(grid_x, grid_y, threads=(64, 2)) as (bx, by):
    tx, ty = T.get_thread_bindings()
    ...

Emit a CPU kernel where thread bindings are skipped:

with T.Kernel(loop_extent, is_cpu=True) as (i,):
    ...
tilelang.language.kernel.CUDASourceCodeKernel(*blocks, threads=None, source_code_or_path, entry_name='main_kernel', cluster_dims=None, prelude=None)

Launch a kernel from CUDA source code or a CUDA source file.

The code must follows the following rules: 1. The kernel source must be a valid CUDA kernel which can be correctly compiled under TileLang's context. 2. The kernel source must either contains only one __global__ function as an entry, or have a __global__ entry function named main_kernel.

參數:
  • source_code_or_path (str | os.PathLike[str]) -- Inline CUDA source code, or a path to a CUDA source file. If the argument resolves to an existing file, the file contents are loaded. Otherwise it is treated as inline CUDA source code.

  • blocks (int) -- A list of extent, can be 1-3 dimension, representing gridDim.(x|y|z)

  • entry_name (str | None) -- Optional name of the __global__ CUDA entry function inside the provided source. When specified, TileLang launches that external CUDA entry directly.

  • threads (int) -- A integer representing blockDim.x Or a list of integers representing blockDim.(x|y|z) if the value is -1, we skip the threadIdx.x binding.

  • cluster_dims (int | tuple[int, int, int] | list[int] | None) -- The cluster dimensions for SM90+ cluster launch. For example, use 2 or (2, 1, 1) to create 2-CTA clusters. When specified, the kernel will be launched using cudaLaunchKernelEx with cudaLaunchAttributeClusterDimension.

  • prelude (str) -- The import c code of the kernel, will be injected before the generated kernel code.

回傳值型別:

None

tilelang.language.kernel.get_thread_binding(dim=0)

Returns the thread binding for the given dimension.

參數:

dim (int)

回傳值型別:

tvm.tir.Var

tilelang.language.kernel.get_thread_bindings()

Returns all three thread bindings.

回傳值型別:

list[tvm.tir.Var]

tilelang.language.kernel.get_block_binding(dim=0)

Returns the block binding for the given dimension.

參數:

dim (int)

回傳值型別:

tvm.tir.Var

tilelang.language.kernel.get_block_bindings()

Returns all three block bindings.

回傳值型別:

list[tvm.tir.Var]

tilelang.language.kernel.get_thread_extent(dim=0)

Returns the thread extent for the given dimension.

參數:

dim (int)

回傳值型別:

int

tilelang.language.kernel.get_thread_extents()

Returns all three thread extents.

回傳值型別:

list[int]

tilelang.language.kernel.get_block_extent(dim=0)

Returns the block extent for the given dimension.

參數:

dim (int)

回傳值型別:

int

tilelang.language.kernel.get_block_extents()

Returns all three block extents.

回傳值型別:

list[int]