tilelang.transform.pass_config¶
Classes¶
Pass configuration keys for TileLang compiler. |
Module Contents¶
- class tilelang.transform.pass_config.PassConfigKey¶
Bases:
str,enum.EnumPass configuration keys for TileLang compiler.
- TL_SIMPLIFY = 'tl.Simplify'¶
True
- Type:
Enable/disable TileLang simplification passes. Default
- TL_DISABLE_DATA_RACE_CHECK = 'tl.disable_data_race_check'¶
False
- Type:
Disable data race check in TileLang. Default
- TL_DISABLE_WARP_SPECIALIZED = 'tl.disable_warp_specialized'¶
False
- Type:
Disable warp specialization optimization. Default
- TL_ENABLE_FAST_MATH = 'tl.enable_fast_math'¶
False if enabled, –use_fast_math will be passed to nvcc
- Type:
Enable fast math optimization. Default
- TL_PTXAS_REGISTER_USAGE_LEVEL = 'tl.ptxas_register_usage_level'¶
The PTXAS register usage level in [0, 10], which controls the aggressiveness of optimizations that affect register usage. Default: None
- TL_ENABLE_PTXAS_VERBOSE_OUTPUT = 'tl.enable_ptxas_verbose_output'¶
False
- Type:
Enable ptxas verbose output. Default
- TL_DEVICE_COMPILE_FLAGS = 'tl.device_compile_flags'¶
Additional device compiler flags passed to nvcc/NVRTC.
Accepts either a string (parsed with shell-like splitting) or a list of strings. Typical usage is to provide extra include paths, defines or ptxas options, e.g.:
“-I/opt/include -DMY_SWITCH=1 –ptxas-options=–verbose”
[“-I/opt/include”, “-DMY_SWITCH=1”, “–ptxas-options=–verbose”]
These flags are appended to the compiler options used in the tvm_ffi CUDA compile callback. Default: None
- TL_CONFIG_INDEX_BITWIDTH = 'tl.config_index_bitwidth'¶
32
- Type:
Bitwidth for configuration indices. Default
- TL_DISABLE_TMA_LOWER = 'tl.disable_tma_lower'¶
False
- Type:
Disable TMA (Tensor Memory Access) lowering. Default
- TL_DISABLE_SAFE_MEMORY_ACCESS = 'tl.disable_safe_memory_legalize'¶
False
- Type:
Disable safe memory access optimization. Default
- TL_DISABLE_VECTORIZE_256 = 'tl.disable_vectorize_256'¶
False
- Type:
Disable usage of LDG/STG 256. Default
- TL_ENABLE_VECTORIZE_PLANNER_VERBOSE = 'tl.enable_vectorize_planner_verbose'¶
Enable verbose output for vectorize planner. When enabled, prints detailed information about each buffer’s inferred vector size and which buffer determines the final vectorization factor. Useful for debugging vectorization issues. Default: False
- TL_DISABLE_WGMMA = 'tl.disable_wgmma'¶
False
- Type:
Disable usage of Hopper WGMMA. Default
- TL_DEBUG_MERGE_SHARED_MEMORY_ALLOCATIONS = 'tl.debug_merge_shared_memory_allocations'¶
False
- Type:
Enable debug information for merge shared memory allocations. Default
- TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE = 'tl.enable_aggressive_shared_memory_merge'¶
False
- Type:
Enable aggressive merge of shared memory allocations. Default
- TL_DISABLE_SHUFFLE_ELECT = 'tl.disable_shuffle_elect'¶
False
- Type:
Disable shuffle election optimization. Default
- TL_DISABLE_THREAD_STORAGE_SYNC = 'tl.disable_thread_storage_sync'¶
Disable thread storage synchronization pass. When enabled, disables the automatic insertion of thread synchronization barriers (e.g., __syncthreads()) for shared memory access coordination. This can be useful for performance optimization in cases where manual synchronization is preferred or when synchronization is not needed. Default: False
- TL_FORCE_LET_INLINE = 'tl.force_let_inline'¶
False
- Type:
Force TileLang to inline let bindings during simplification. Default
- TL_AST_PRINT_ENABLE = 'tl.ast_print_enable'¶
False
- Type:
Enable TIR AST printing for debugging purposes. Default
- TL_LAYOUT_VISUALIZATION_ENABLE = 'tl.layout_visualization_enable'¶
False
- Type:
Enable layout inference visualization. Default
- TL_LAYOUT_VISUALIZATION_FORMATS = 'tl.layout_visualization_formats'¶
Layout visualization formats. Acceptable values: “pdf”, “png”, “svg”, “all”
- TL_STORAGE_REWRITE_DETECT_INPLACE = 'tl.storage_rewrite_detect_inplace'¶
Control StorageRewrite inplace detection.
When False (default) StorageRewrite keeps distinct temporaries for patterns such as dst[i] = f(src[i]), avoiding implicit aliasing:
` read = T.allocate([1], T.int32, "local.var") write = T.allocate([1], T.int32, "local.var") read_buf = T.Buffer((1,), T.int32, data=read, scope="local.var") write_buf = T.Buffer((1,), T.int32, data=write, scope="local.var") write_buf[0] = read_buf[0] * 2 f(write_buf[0]) `Setting the flag to True allows StorageRewrite to reuse the read buffer for the write when it can prove the update is safely inplace, producing IR like:
` read = T.allocate([1], T.int32, "local.var") read_buf = T.Buffer((1,), T.int32, data=read, scope="local.var") read_buf[0] = read_buf[0] * 2 f(read_buf[0]) `This reduces local memory usage but introduces aliasing between the buffers.
Usage:
```python from tilelang.transform import PassContext, PassConfigKey
- with PassContext(
config={PassConfigKey.TL_STORAGE_REWRITE_DETECT_INPLACE.value: True}
- ):
mod = tilelang.transform.StorageRewrite()(mod)
- TIR_ENABLE_EQUIV_TERMS_IN_CSE = 'tir.enable_equiv_terms_in_cse_tir'¶
True
- Type:
Enable equivalent terms in TIR Common Subexpression Elimination. Default
- TIR_DISABLE_CSE = 'tir.disable_cse_tir'¶
False
- Type:
Disable TIR Common Subexpression Elimination. Default
- TIR_SIMPLIFY = 'tir.Simplify'¶
True
- Type:
Enable/disable TIR simplification passes. Default
- TIR_DISABLE_STORAGE_REWRITE = 'tir.disable_storage_rewrite'¶
False
- Type:
Disable storage rewrite optimization. Default
- TIR_DISABLE_VECTORIZE = 'tir.disable_vectorize'¶
False
- Type:
Disable vectorization optimization. Default
- TIR_USE_ASYNC_COPY = 'tir.use_async_copy'¶
True
- Type:
Enable asynchronous memory copy operations. Default
- TIR_ENABLE_DEBUG = 'tir.enable_debug'¶
False
- Type:
Enable debug information in generated code. Default
- TIR_MERGE_STATIC_SMEM = 'tir.merge_static_smem'¶
True
- Type:
Merge static shared memory allocations. Default
- TIR_ADD_LOWER_PASS = 'tir.add_lower_pass'¶
None
- Type:
Additional lowering passes to be applied. Default
- TIR_NOALIAS = 'tir.noalias'¶
True
- Type:
Enable pointer non-aliasing assumptions. Default
- CUDA_KERNELS_OUTPUT_DIR = 'cuda.kernels_output_dir'¶
empty string
- Type:
Output directory for generated CUDA kernels. Default