tilelang.transform ================== .. py:module:: tilelang.transform .. autoapi-nested-parse:: Wrapping transformations. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/tilelang/transform/add_bufstore_wrapper/index /autoapi/tilelang/transform/pass_config/index /autoapi/tilelang/transform/simplify/index Functions --------- .. autoapisummary:: tilelang.transform.get_pass_context tilelang.transform.ClusterPlanning tilelang.transform.PipelinePlanning tilelang.transform.LayoutInference tilelang.transform.LowerTileOp tilelang.transform.InjectSoftwarePipeline tilelang.transform.FrontendLegalize tilelang.transform.LegalizeNegativeIndex tilelang.transform.InjectAssumes tilelang.transform.LowerHopperIntrin tilelang.transform.WarpSpecializedPipeline tilelang.transform.RewriteWgmmaSync tilelang.transform.ThreadSync tilelang.transform.ThreadPartialSync tilelang.transform.IfStmtBinding tilelang.transform.MergeIfStmt tilelang.transform.MultiVersionBuffer tilelang.transform.WarpSpecialized tilelang.transform.AnnotateWarpGroupRegAlloc tilelang.transform.InjectTmaBarrier tilelang.transform.InjectFenceProxy tilelang.transform.LegalizeVectorizedLoop tilelang.transform.LegalizeSafeMemoryAccess tilelang.transform.MakePackedAPI tilelang.transform.AnnotateDeviceRegions tilelang.transform.SplitHostDevice tilelang.transform.AnnotateReadOnlyParams tilelang.transform.VectorizeLoop tilelang.transform.InjectPTXAsyncCopy tilelang.transform.LowerDeviceStorageAccessInfo tilelang.transform.ConfigIndexBitwidth tilelang.transform.FlattenBuffer tilelang.transform.EliminateStorageSyncForMBarrier tilelang.transform.MergeSharedMemoryAllocations tilelang.transform.LowerL2Persistent tilelang.transform.PersistThreadblock tilelang.transform.AlignDynamicSharedMemoryAllocations tilelang.transform.LowerSharedBarrier tilelang.transform.PlanAndUpdateBufferAllocationLocation tilelang.transform.StorageRewrite tilelang.transform.LowerOpaqueBlock tilelang.transform.LowerThreadAllreduce tilelang.transform.LowerIntrin tilelang.transform.LowerDeviceKernelLaunch tilelang.transform.LowerSharedTmem tilelang.transform.LayoutReducer Package Contents ---------------- .. py:function:: get_pass_context() Get the current pass context .. py:function:: ClusterPlanning() ClusterPlanning :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: PipelinePlanning() infer the fragment/shared memory layout :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LayoutInference() LayoutInference :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LowerTileOp() LowerTileOp :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: InjectSoftwarePipeline() InjectSoftwarePipeline :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: FrontendLegalize() FrontendLegalize :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LegalizeNegativeIndex() Legalize negative indices in buffer loads. :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: InjectAssumes() Inject Assumes Returns: ------- fpass : tvm.transform.Pass The result pass .. py:function:: LowerHopperIntrin() LowerHopperIntrin :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: WarpSpecializedPipeline() WarpSpecializedPipeline :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: RewriteWgmmaSync() RewriteWgmmaSync :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: ThreadSync(storage_scope) Insert sync between parallel read/write of shared buffers. :param storage_scope: The target storage scope. :type storage_scope: str :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: ThreadPartialSync(storage_scope) Insert partial sync. :param storage_scope: The target storage scope. :type storage_scope: str :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: IfStmtBinding() IfStmtBinding :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: MergeIfStmt() MergeIfStmt :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: MultiVersionBuffer() WarpSpecializedPipeline :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: WarpSpecialized() WarpSpecializedPipeline :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: AnnotateWarpGroupRegAlloc() Inject set_max_nreg calls into warp-specialized functions. This pass analyzes the function to collect register hints from set_max_nreg and no_set_max_nreg calls, then injects appropriate set_max_nreg calls into producer and consumer branches of warp-specialized code. :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: InjectTmaBarrier() InjectTmaBarrier :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: InjectFenceProxy() InjectFenceProxy :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LegalizeVectorizedLoop() LegalizeLoopVectorize :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LegalizeSafeMemoryAccess() LegalizeLoopVectorize :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: MakePackedAPI() MakePackedAPI :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: AnnotateDeviceRegions() AnnotateDeviceRegions :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: SplitHostDevice() Split host/device functions even for empty kernels. :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: AnnotateReadOnlyParams() Annotate read-only handle parameters for PrimFuncs. Adds attribute `tl.readonly_param_indices` listing param indices that are never written, enabling CUDA codegen to emit `const` qualifiers to unlock read-only cache loads. :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: VectorizeLoop(enable_vectorize = True) VectorizeLoop :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: InjectPTXAsyncCopy() Rewrite global to shared memory copy on CUDA with asynchronous copy. :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LowerDeviceStorageAccessInfo() Lower attached storage access information on device. :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. note:: Run this pass after all storage access analysis finish. .. py:function:: ConfigIndexBitwidth() Config index bitwidth. :returns: * **fpass** (*tvm.transform.Pass*) -- The result pass * *----* .. py:function:: FlattenBuffer() FlattenBuffer :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: EliminateStorageSyncForMBarrier() EliminateStorageSyncForMBarrier .. py:function:: MergeSharedMemoryAllocations(enable_aggressive_merge = False, align_bytes = 16) MergeSharedMemoryAllocations :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LowerL2Persistent() LowerL2Persistent .. py:function:: PersistThreadblock() PersistThreadblock .. py:function:: AlignDynamicSharedMemoryAllocations(align_bytes = 16) AlignDynamicSharedMemoryAllocations :param align_bytes: The alignment bytes. :type align_bytes: int .. py:function:: LowerSharedBarrier() LowerSharedBarrier .. py:function:: PlanAndUpdateBufferAllocationLocation() Plan and update buffer allocation locations within PrimFuncs. :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: StorageRewrite() StorageRewrite :returns: **fpass** -- The result pass :rtype: tvm.transform.Pass .. py:function:: LowerOpaqueBlock() LowerOpaqueBlock .. py:function:: LowerThreadAllreduce() LowerThreadAllreduce .. py:function:: LowerIntrin() LowerIntrin .. py:function:: LowerDeviceKernelLaunch() Create and return a transform pass that lowers device kernel launch constructs to target-specific IR. This pass transforms high-level device kernel launch and related intrinsics into lower-level IR suitable for backend code generation and device-side lowering. :returns: The transform pass that performs device kernel launch lowering. :rtype: tvm.transform.Pass .. py:function:: LowerSharedTmem() LowerSharedTmem .. py:function:: LayoutReducer() Return a TVM transform pass that performs layout reduction/normalization. This wrapper delegates to the underlying FFI implementation and returns a pass object suitable for use in a PassContext or pass pipeline. The pass is intended to simplify or reduce tensor/layout-related representations during relay/tile transformations. :returns: The transform pass object produced by the FFI backend.