tilelang.transform.pass_configΒΆ
ClassesΒΆ
Pass configuration keys for TileLang compiler. |
Module ContentsΒΆ
- class tilelang.transform.pass_config.PassConfigKeyΒΆ
Bases:
str,enum.EnumPass configuration keys for TileLang compiler.
- TL_SIMPLIFY = 'tl.Simplify'ΒΆ
True
- Type:
Enable/disable TileLang simplification passes. Default
- TL_DYNAMIC_ALIGNMENT = 'tl.dynamic_alignment'ΒΆ
16
- Type:
Memory alignment requirement for dynamic shapes. Default
- TL_DISABLE_DYNAMIC_TAIL_SPLIT = 'tl.disable_dynamic_tail_split'ΒΆ
False
- Type:
Disable dynamic tail splitting optimization. Default
- TL_DISABLE_WARP_SPECIALIZED = 'tl.disable_warp_specialized'ΒΆ
False
- Type:
Disable warp specialization optimization. Default
- TL_DISABLE_FAST_MATH = 'tl.disable_fast_math'ΒΆ
True will be deprecated in the 0.1.7 release
- Type:
Disable fast math optimization. Default
- TL_ENABLE_FAST_MATH = 'tl.enable_fast_math'ΒΆ
False if enabled, βuse_fast_math will be passed to nvcc
- Type:
Enable fast math optimization. Default
- TL_PTXAS_REGISTER_USAGE_LEVEL = 'tl.ptxas_register_usage_level'ΒΆ
The PTXAS register usage level in [0, 10], which controls the aggressiveness of optimizations that affect register usage. Default: None
- TL_ENABLE_PTXAS_VERBOSE_OUTPUT = 'tl.enable_ptxas_verbose_output'ΒΆ
False
- Type:
Enable ptxas verbose output. Default
- TL_CONFIG_INDEX_BITWIDTH = 'tl.config_index_bitwidth'ΒΆ
32
- Type:
Bitwidth for configuration indices. Default
- TL_DISABLE_TMA_LOWER = 'tl.disable_tma_lower'ΒΆ
False
- Type:
Disable TMA (Tensor Memory Access) lowering. Default
- TL_DISABLE_SAFE_MEMORY_ACCESS = 'tl.disable_safe_memory_legalize'ΒΆ
False
- Type:
Disable safe memory access optimization. Default
- TL_DISABLE_VECTORIZE_256 = 'tl.disable_vectorize_256'ΒΆ
False
- Type:
Disable usage of LDG/STG 256. Default
- TL_DISABLE_WGMMA = 'tl.disable_wgmma'ΒΆ
False
- Type:
Disable usage of Hopper WGMMA. Default
- TL_DEBUG_MERGE_SHARED_MEMORY_ALLOCATIONS = 'tl.debug_merge_shared_memory_allocations'ΒΆ
False
- Type:
Enable debug information for merge shared memory allocations. Default
- TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE = 'tl.enable_aggressive_shared_memory_merge'ΒΆ
False
- Type:
Enable aggressive merge of shared memory allocations. Default
- TL_DISABLE_SHUFFLE_ELECT = 'tl.disable_shuffle_elect'ΒΆ
False
- Type:
Disable shuffle election optimization. Default
- TL_DISABLE_THREAD_STORAGE_SYNC = 'tl.disable_thread_storage_sync'ΒΆ
Disable thread storage synchronization pass. When enabled, disables the automatic insertion of thread synchronization barriers (e.g., __syncthreads()) for shared memory access coordination. This can be useful for performance optimization in cases where manual synchronization is preferred or when synchronization is not needed. Default: False
- TL_FORCE_LET_INLINE = 'tl.force_let_inline'ΒΆ
False
- Type:
Force TileLang to inline let bindings during simplification. Default
- TL_LAYOUT_VISUALIZATION_ENABLE = 'tl.layout_visualization_enable'ΒΆ
False
- Type:
Enable layout inference visualization. Default
- TL_LAYOUT_VISUALIZATION_FORMATS = 'tl.layout_visualization_formats'ΒΆ
Layout visualization formats. Acceptable values: βpdfβ, βpngβ, βsvgβ, βallβ
- TL_STORAGE_REWRITE_DETECT_INPLACE = 'tl.storage_rewrite_detect_inplace'ΒΆ
Control StorageRewrite inplace detection.
When False (default) StorageRewrite keeps distinct temporaries for patterns such as dst[i] = f(src[i]), avoiding implicit aliasing:
` read = T.allocate([1], "int32", "local.var") write = T.allocate([1], "int32", "local.var") read_buf = T.Buffer((1,), "int32", data=read, scope="local.var") write_buf = T.Buffer((1,), "int32", data=write, scope="local.var") write_buf[0] = read_buf[0] * 2 f(write_buf[0]) `Setting the flag to True allows StorageRewrite to reuse the read buffer for the write when it can prove the update is safely inplace, producing IR like:
` read = T.allocate([1], "int32", "local.var") read_buf = T.Buffer((1,), "int32", data=read, scope="local.var") read_buf[0] = read_buf[0] * 2 f(read_buf[0]) `This reduces local memory usage but introduces aliasing between the buffers.
Usage:
```python from tilelang.transform import PassContext, PassConfigKey
- with PassContext(
config={PassConfigKey.TL_STORAGE_REWRITE_DETECT_INPLACE.value: True}
- ):
mod = tilelang.transform.StorageRewrite()(mod)
- TIR_ENABLE_EQUIV_TERMS_IN_CSE = 'tir.enable_equiv_terms_in_cse_tir'ΒΆ
True
- Type:
Enable equivalent terms in TIR Common Subexpression Elimination. Default
- TIR_DISABLE_CSE = 'tir.disable_cse_tir'ΒΆ
False
- Type:
Disable TIR Common Subexpression Elimination. Default
- TIR_SIMPLIFY = 'tir.Simplify'ΒΆ
True
- Type:
Enable/disable TIR simplification passes. Default
- TIR_DISABLE_STORAGE_REWRITE = 'tir.disable_storage_rewrite'ΒΆ
False
- Type:
Disable storage rewrite optimization. Default
- TIR_DISABLE_VECTORIZE = 'tir.disable_vectorize'ΒΆ
False
- Type:
Disable vectorization optimization. Default
- TIR_USE_ASYNC_COPY = 'tir.use_async_copy'ΒΆ
True
- Type:
Enable asynchronous memory copy operations. Default
- TIR_ENABLE_DEBUG = 'tir.enable_debug'ΒΆ
False
- Type:
Enable debug information in generated code. Default
- TIR_MERGE_STATIC_SMEM = 'tir.merge_static_smem'ΒΆ
True
- Type:
Merge static shared memory allocations. Default
- TIR_ADD_LOWER_PASS = 'tir.add_lower_pass'ΒΆ
None
- Type:
Additional lowering passes to be applied. Default
- TIR_NOALIAS = 'tir.noalias'ΒΆ
True
- Type:
Enable pointer non-aliasing assumptions. Default
- CUDA_KERNELS_OUTPUT_DIR = 'cuda.kernels_output_dir'ΒΆ
empty string
- Type:
Output directory for generated CUDA kernels. Default