tilelang.transform.pass_config ============================== .. py:module:: tilelang.transform.pass_config Classes ------- .. autoapisummary:: tilelang.transform.pass_config.PassConfigKey Module Contents --------------- .. py:class:: PassConfigKey Bases: :py:obj:`str`, :py:obj:`enum.Enum` Pass configuration keys for TileLang compiler. .. py:attribute:: TL_SIMPLIFY :value: 'tl.Simplify' True :type: Enable/disable TileLang simplification passes. Default .. py:attribute:: TL_DYNAMIC_ALIGNMENT :value: 'tl.dynamic_alignment' 16 :type: Memory alignment requirement for dynamic shapes. Default .. py:attribute:: TL_DISABLE_DYNAMIC_TAIL_SPLIT :value: 'tl.disable_dynamic_tail_split' False :type: Disable dynamic tail splitting optimization. Default .. py:attribute:: TL_DISABLE_WARP_SPECIALIZED :value: 'tl.disable_warp_specialized' False :type: Disable warp specialization optimization. Default .. py:attribute:: TL_DISABLE_FAST_MATH :value: 'tl.disable_fast_math' True will be deprecated in the 0.1.7 release :type: Disable fast math optimization. Default .. py:attribute:: TL_ENABLE_FAST_MATH :value: 'tl.enable_fast_math' False if enabled, --use_fast_math will be passed to nvcc :type: Enable fast math optimization. Default .. py:attribute:: TL_PTXAS_REGISTER_USAGE_LEVEL :value: 'tl.ptxas_register_usage_level' The PTXAS register usage level in [0, 10], which controls the aggressiveness of optimizations that affect register usage. Default: None .. py:attribute:: TL_ENABLE_PTXAS_VERBOSE_OUTPUT :value: 'tl.enable_ptxas_verbose_output' False :type: Enable ptxas verbose output. Default .. py:attribute:: TL_CONFIG_INDEX_BITWIDTH :value: 'tl.config_index_bitwidth' 32 :type: Bitwidth for configuration indices. Default .. py:attribute:: TL_DISABLE_TMA_LOWER :value: 'tl.disable_tma_lower' False :type: Disable TMA (Tensor Memory Access) lowering. Default .. py:attribute:: TL_DISABLE_SAFE_MEMORY_ACCESS :value: 'tl.disable_safe_memory_legalize' False :type: Disable safe memory access optimization. Default .. py:attribute:: TL_DISABLE_VECTORIZE_256 :value: 'tl.disable_vectorize_256' False :type: Disable usage of LDG/STG 256. Default .. py:attribute:: TL_DISABLE_WGMMA :value: 'tl.disable_wgmma' False :type: Disable usage of Hopper WGMMA. Default .. py:attribute:: TL_DEBUG_MERGE_SHARED_MEMORY_ALLOCATIONS :value: 'tl.debug_merge_shared_memory_allocations' False :type: Enable debug information for merge shared memory allocations. Default .. py:attribute:: TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE :value: 'tl.enable_aggressive_shared_memory_merge' False :type: Enable aggressive merge of shared memory allocations. Default .. py:attribute:: TL_DISABLE_SHUFFLE_ELECT :value: 'tl.disable_shuffle_elect' False :type: Disable shuffle election optimization. Default .. py:attribute:: TL_DISABLE_THREAD_STORAGE_SYNC :value: 'tl.disable_thread_storage_sync' Disable thread storage synchronization pass. When enabled, disables the automatic insertion of thread synchronization barriers (e.g., __syncthreads()) for shared memory access coordination. This can be useful for performance optimization in cases where manual synchronization is preferred or when synchronization is not needed. Default: False .. py:attribute:: TL_FORCE_LET_INLINE :value: 'tl.force_let_inline' False :type: Force TileLang to inline let bindings during simplification. Default .. py:attribute:: TL_LAYOUT_VISUALIZATION_ENABLE :value: 'tl.layout_visualization_enable' False :type: Enable layout inference visualization. Default .. py:attribute:: TL_LAYOUT_VISUALIZATION_FORMATS :value: 'tl.layout_visualization_formats' Layout visualization formats. Acceptable values: "pdf", "png", "svg", "all" .. py:attribute:: TL_STORAGE_REWRITE_DETECT_INPLACE :value: 'tl.storage_rewrite_detect_inplace' Control StorageRewrite inplace detection. When False (default) StorageRewrite keeps distinct temporaries for patterns such as `dst[i] = f(src[i])`, avoiding implicit aliasing: ``` read = T.allocate([1], "int32", "local.var") write = T.allocate([1], "int32", "local.var") read_buf = T.Buffer((1,), "int32", data=read, scope="local.var") write_buf = T.Buffer((1,), "int32", data=write, scope="local.var") write_buf[0] = read_buf[0] * 2 f(write_buf[0]) ``` Setting the flag to True allows StorageRewrite to reuse the `read` buffer for the write when it can prove the update is safely inplace, producing IR like: ``` read = T.allocate([1], "int32", "local.var") read_buf = T.Buffer((1,), "int32", data=read, scope="local.var") read_buf[0] = read_buf[0] * 2 f(read_buf[0]) ``` This reduces local memory usage but introduces aliasing between the buffers. Usage: ```python from tilelang.transform import PassContext, PassConfigKey with PassContext( config={PassConfigKey.TL_STORAGE_REWRITE_DETECT_INPLACE.value: True} ): mod = tilelang.transform.StorageRewrite()(mod) ``` .. py:attribute:: TIR_ENABLE_EQUIV_TERMS_IN_CSE :value: 'tir.enable_equiv_terms_in_cse_tir' True :type: Enable equivalent terms in TIR Common Subexpression Elimination. Default .. py:attribute:: TIR_DISABLE_CSE :value: 'tir.disable_cse_tir' False :type: Disable TIR Common Subexpression Elimination. Default .. py:attribute:: TIR_SIMPLIFY :value: 'tir.Simplify' True :type: Enable/disable TIR simplification passes. Default .. py:attribute:: TIR_DISABLE_STORAGE_REWRITE :value: 'tir.disable_storage_rewrite' False :type: Disable storage rewrite optimization. Default .. py:attribute:: TIR_DISABLE_VECTORIZE :value: 'tir.disable_vectorize' False :type: Disable vectorization optimization. Default .. py:attribute:: TIR_USE_ASYNC_COPY :value: 'tir.use_async_copy' True :type: Enable asynchronous memory copy operations. Default .. py:attribute:: TIR_ENABLE_DEBUG :value: 'tir.enable_debug' False :type: Enable debug information in generated code. Default .. py:attribute:: TIR_MERGE_STATIC_SMEM :value: 'tir.merge_static_smem' True :type: Merge static shared memory allocations. Default .. py:attribute:: TIR_ADD_LOWER_PASS :value: 'tir.add_lower_pass' None :type: Additional lowering passes to be applied. Default .. py:attribute:: TIR_NOALIAS :value: 'tir.noalias' True :type: Enable pointer non-aliasing assumptions. Default .. py:attribute:: CUDA_KERNELS_OUTPUT_DIR :value: 'cuda.kernels_output_dir' empty string :type: Output directory for generated CUDA kernels. Default