tilelang.autotuner.param¶

The auto-tune parameters.

Attributes¶

`BEST_CONFIG_PATH`
`FUNCTION_PATH`
`OUT_IDX_PATH`
`LATENCY_PATH`
`DEVICE_KERNEL_PATH`
`HOST_KERNEL_PATH`
`EXECUTABLE_PATH`
`KERNEL_LIB_PATH`
`KERNEL_CUBIN_PATH`
`KERNEL_PY_PATH`
`PARAMS_PATH`

類別¶

`CompileArgs`	Compile arguments for the auto-tuner. Detailed description can be found in tilelang.jit.compile.
`ProfileArgs`	Profile arguments for the auto-tuner.
`AutotuneResult`	Results from auto-tuning process.

Module Contents¶

tilelang.autotuner.param.BEST_CONFIG_PATH = 'best_config.json'¶

tilelang.autotuner.param.FUNCTION_PATH = 'function.pkl'¶

tilelang.autotuner.param.OUT_IDX_PATH = 'out_idx.json'¶

tilelang.autotuner.param.LATENCY_PATH = 'latency.json'¶

tilelang.autotuner.param.DEVICE_KERNEL_PATH = 'device_kernel.cu'¶

tilelang.autotuner.param.HOST_KERNEL_PATH = 'host_kernel.cu'¶

tilelang.autotuner.param.EXECUTABLE_PATH = 'executable.so'¶

tilelang.autotuner.param.KERNEL_LIB_PATH = 'kernel_lib.so'¶

tilelang.autotuner.param.KERNEL_CUBIN_PATH = 'kernel.cubin'¶

tilelang.autotuner.param.KERNEL_PY_PATH = 'kernel.py'¶

tilelang.autotuner.param.PARAMS_PATH = 'params.pkl'¶

class tilelang.autotuner.param.CompileArgs¶

Compile arguments for the auto-tuner. Detailed description can be found in tilelang.jit.compile. .. attribute:: out_idx

List of output tensor indices.

execution_backend¶: Execution backend to use for kernel execution (default: "auto").

target¶: Compilation target, either as a string or a TVM Target object (default: "auto").

target_host¶: Target host for cross-compilation (default: None).

verbose¶: Whether to enable verbose output (default: False).

pass_configs¶: Additional keyword arguments to pass to the Compiler PassContext.

Refer to `tilelang.PassConfigKey` for supported options.

out_idx: list[int] | int | None = None¶

execution_backend: Literal['auto', 'tvm_ffi', 'cython', 'nvrtc', 'torch'] = 'auto'¶

target: Literal['auto', 'cuda', 'hip'] = 'auto'¶

target_host: str | tvm.target.Target = None¶

verbose: bool = False¶

pass_configs: dict[str, Any] | None = None¶

compile_program(program)¶

參數:: program (tvm.tir.PrimFunc)

__hash__()¶

class tilelang.autotuner.param.ProfileArgs¶

Profile arguments for the auto-tuner.

warmup¶: Number of warmup iterations.

rep¶: Number of repetitions for timing.

timeout¶: Maximum time per configuration.

backend¶: Profiler backend - "event" (CUDA events), "cupti", or "cudagraph".

supply_type¶: Type of tensor supply mechanism.

ref_prog¶: Reference program for correctness validation.

supply_prog¶: Supply program for input tensors.

out_idx¶: Union[List[int], int] = -1

supply_type¶: tilelang.TensorSupplyType = tilelang.TensorSupplyType.Auto

ref_prog¶: Callable = None

supply_prog¶: Callable = None

rtol¶: float = 1e-2

atol¶: float = 1e-2

max_mismatched_ratio¶: float = 0.01

skip_check¶: bool = False

manual_check_prog¶: Callable = None

cache_input_tensors¶: bool = True

warmup: int = 25¶

rep: int = 100¶

timeout: int = 30¶

backend: Literal['event', 'cupti', 'cudagraph'] = 'event'¶

supply_type: tilelang.TensorSupplyType¶

ref_prog: Callable = None¶

supply_prog: Callable = None¶

rtol: float = 0.01¶

atol: float = 0.01¶

max_mismatched_ratio: float = 0.01¶

skip_check: bool = False¶

manual_check_prog: Callable = None¶

cache_input_tensors: bool = True¶

__hash__()¶

class tilelang.autotuner.param.AutotuneResult¶

Results from auto-tuning process.

latency¶: Best achieved execution latency.

config¶: Configuration that produced the best result.

ref_latency¶: Reference implementation latency.

libcode¶: Generated library code.

func¶: Optimized function.

kernel¶: Compiled kernel function.

latency: float | None = None¶

config: dict | None = None¶

ref_latency: float | None = None¶

libcode: str | None = None¶

func: Callable | None = None¶

kernel: Callable | None = None¶

save_to_disk(path, verbose=False)¶

Persist autotune result to disk using atomic directory rename.

All files are written into a temporary staging directory under the shared namespace staging root. Once complete, the staging directory is atomically renamed to path so that concurrent readers never see a half-written result.

參數:

path (pathlib.Path)
verbose (bool)

classmethod load_from_disk(path, compile_args)¶

參數:

path (pathlib.Path)
compile_args (CompileArgs)

回傳值型別:

AutotuneResult