tilelang.transform.pass_config¶

Classes¶

PassConfigKey

Pass configuration keys for TileLang compiler.

Module Contents¶

class tilelang.transform.pass_config.PassConfigKey¶

Bases: str, enum.Enum

Pass configuration keys for TileLang compiler.

TL_SIMPLIFY = 'tl.Simplify'¶

True

Type:

Enable/disable TileLang simplification passes. Default

TL_DISABLE_DATA_RACE_CHECK = 'tl.disable_data_race_check'¶

False

Type:

Disable data race check in TileLang. Default

TL_DISABLE_WARP_SPECIALIZED = 'tl.disable_warp_specialized'¶

False

Type:

Disable warp specialization optimization. Default

TL_ENABLE_FAST_MATH = 'tl.enable_fast_math'¶

False if enabled, –use_fast_math will be passed to nvcc

Type:

Enable fast math optimization. Default

TL_PTXAS_REGISTER_USAGE_LEVEL = 'tl.ptxas_register_usage_level'¶

The PTXAS register usage level in [0, 10], which controls the aggressiveness of optimizations that affect register usage. Default: None

TL_ENABLE_PTXAS_VERBOSE_OUTPUT = 'tl.enable_ptxas_verbose_output'¶

False

Type:

Enable ptxas verbose output. Default

TL_DEVICE_COMPILE_FLAGS = 'tl.device_compile_flags'¶

Additional device compiler flags passed to nvcc/NVRTC.

Accepts either a string (parsed with shell-like splitting) or a list of strings. Typical usage is to provide extra include paths, defines or ptxas options, e.g.:

  • “-I/opt/include -DMY_SWITCH=1 –ptxas-options=–verbose”

  • [“-I/opt/include”, “-DMY_SWITCH=1”, “–ptxas-options=–verbose”]

These flags are appended to the compiler options used in the tvm_ffi CUDA compile callback. Default: None

TL_CONFIG_INDEX_BITWIDTH = 'tl.config_index_bitwidth'¶

32

Type:

Bitwidth for configuration indices. Default

TL_DISABLE_TMA_LOWER = 'tl.disable_tma_lower'¶

False

Type:

Disable TMA (Tensor Memory Access) lowering. Default

TL_DISABLE_SAFE_MEMORY_ACCESS = 'tl.disable_safe_memory_legalize'¶

False

Type:

Disable safe memory access optimization. Default

TL_DISABLE_VECTORIZE_256 = 'tl.disable_vectorize_256'¶

False

Type:

Disable usage of LDG/STG 256. Default

TL_ENABLE_VECTORIZE_PLANNER_VERBOSE = 'tl.enable_vectorize_planner_verbose'¶

Enable verbose output for vectorize planner. When enabled, prints detailed information about each buffer’s inferred vector size and which buffer determines the final vectorization factor. Useful for debugging vectorization issues. Default: False

TL_DISABLE_WGMMA = 'tl.disable_wgmma'¶

False

Type:

Disable usage of Hopper WGMMA. Default

TL_DEBUG_MERGE_SHARED_MEMORY_ALLOCATIONS = 'tl.debug_merge_shared_memory_allocations'¶

False

Type:

Enable debug information for merge shared memory allocations. Default

TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE = 'tl.enable_aggressive_shared_memory_merge'¶

False

Type:

Enable aggressive merge of shared memory allocations. Default

TL_DISABLE_SHUFFLE_ELECT = 'tl.disable_shuffle_elect'¶

False

Type:

Disable shuffle election optimization. Default

TL_DISABLE_THREAD_STORAGE_SYNC = 'tl.disable_thread_storage_sync'¶

Disable thread storage synchronization pass. When enabled, disables the automatic insertion of thread synchronization barriers (e.g., __syncthreads()) for shared memory access coordination. This can be useful for performance optimization in cases where manual synchronization is preferred or when synchronization is not needed. Default: False

TL_FORCE_LET_INLINE = 'tl.force_let_inline'¶

False

Type:

Force TileLang to inline let bindings during simplification. Default

TL_AST_PRINT_ENABLE = 'tl.ast_print_enable'¶

False

Type:

Enable TIR AST printing for debugging purposes. Default

TL_LAYOUT_VISUALIZATION_ENABLE = 'tl.layout_visualization_enable'¶

False

Type:

Enable layout inference visualization. Default

TL_LAYOUT_VISUALIZATION_FORMATS = 'tl.layout_visualization_formats'¶

Layout visualization formats. Acceptable values: “pdf”, “png”, “svg”, “all”

TL_STORAGE_REWRITE_DETECT_INPLACE = 'tl.storage_rewrite_detect_inplace'¶

Control StorageRewrite inplace detection.

When False (default) StorageRewrite keeps distinct temporaries for patterns such as dst[i] = f(src[i]), avoiding implicit aliasing:

` read = T.allocate([1], T.int32, "local.var") write = T.allocate([1], T.int32, "local.var") read_buf = T.Buffer((1,), T.int32, data=read, scope="local.var") write_buf = T.Buffer((1,), T.int32, data=write, scope="local.var") write_buf[0] = read_buf[0] * 2 f(write_buf[0]) `

Setting the flag to True allows StorageRewrite to reuse the read buffer for the write when it can prove the update is safely inplace, producing IR like:

` read = T.allocate([1], T.int32, "local.var") read_buf = T.Buffer((1,), T.int32, data=read, scope="local.var") read_buf[0] = read_buf[0] * 2 f(read_buf[0]) `

This reduces local memory usage but introduces aliasing between the buffers.

Usage:

```python from tilelang.transform import PassContext, PassConfigKey

with PassContext(

config={PassConfigKey.TL_STORAGE_REWRITE_DETECT_INPLACE.value: True}

):

mod = tilelang.transform.StorageRewrite()(mod)

```

TIR_ENABLE_EQUIV_TERMS_IN_CSE = 'tir.enable_equiv_terms_in_cse_tir'¶

True

Type:

Enable equivalent terms in TIR Common Subexpression Elimination. Default

TIR_DISABLE_CSE = 'tir.disable_cse_tir'¶

False

Type:

Disable TIR Common Subexpression Elimination. Default

TIR_SIMPLIFY = 'tir.Simplify'¶

True

Type:

Enable/disable TIR simplification passes. Default

TIR_DISABLE_STORAGE_REWRITE = 'tir.disable_storage_rewrite'¶

False

Type:

Disable storage rewrite optimization. Default

TIR_DISABLE_VECTORIZE = 'tir.disable_vectorize'¶

False

Type:

Disable vectorization optimization. Default

TIR_USE_ASYNC_COPY = 'tir.use_async_copy'¶

True

Type:

Enable asynchronous memory copy operations. Default

TIR_ENABLE_DEBUG = 'tir.enable_debug'¶

False

Type:

Enable debug information in generated code. Default

TIR_MERGE_STATIC_SMEM = 'tir.merge_static_smem'¶

True

Type:

Merge static shared memory allocations. Default

TIR_ADD_LOWER_PASS = 'tir.add_lower_pass'¶

None

Type:

Additional lowering passes to be applied. Default

TIR_NOALIAS = 'tir.noalias'¶

True

Type:

Enable pointer non-aliasing assumptions. Default

CUDA_KERNELS_OUTPUT_DIR = 'cuda.kernels_output_dir'¶

empty string

Type:

Output directory for generated CUDA kernels. Default