tilelang.profiler¶

The profiler and convert to torch utils

Submodules¶

tilelang.profiler.bench

Classes¶

Profiler

A profiler class for benchmarking and validating kernel implementations.

Package Contents¶

class tilelang.profiler.Profiler¶

A profiler class for benchmarking and validating kernel implementations.

params¶: List of kernel parameters defining the input/output specifications

result_idx¶: Indices indicating which parameters are output tensors

supply_type¶: Type of tensor supply to use (e.g., random, zeros, etc.)

adapter¶: Optional kernel adapter for interfacing with different backends

params: list[tilelang.engine.param.KernelParam]¶

result_idx: list[int]¶

supply_type: tilelang.utils.tensor.TensorSupplyType¶

adapter: tilelang.jit.adapter.BaseKernelAdapter | None = None¶

__post_init__()¶: Initialize tensor supply after dataclass initialization

with_default_adapter(adapter)¶

Parameters:: adapter (tilelang.jit.adapter.BaseKernelAdapter)
Return type:: Profiler

assert_allclose(reference_program, input_tensors=None, atol=0.01, rtol=0.01, max_mismatched_ratio=0.01)¶

Validates kernel output against a reference implementation.

Parameters:

reference_program (Callable) – Reference implementation to compare against
input_tensors (list[torch.Tensor] | None) – Optional pre-generated input tensors
atol (float) – Absolute tolerance for comparison
rtol (float) – Relative tolerance for comparison
max_mismatched_ratio – Maximum allowed ratio of mismatched elements

manual_assert_close(reference_program, input_tensors=None, manual_check_prog=None)¶

Validates kernel output against a reference implementation.

Parameters:

reference_program (Callable) – Reference implementation to compare against
input_tensors (list[torch.Tensor] | None) – Optional pre-generated input tensors
atol – Absolute tolerance for comparison
rtol – Relative tolerance for comparison
max_mismatched_ratio – Maximum allowed ratio of mismatched elements
manual_check_prog (Callable)

assert_consistent(repeat=10)¶

Checks for kernel consistency across multiple runs.

Parameters:: repeat – Number of times to repeat the consistency check

run_once(func=None)¶

Parameters:: func (Callable | None)

do_bench(func=None, warmup=25, rep=100, n_warmup=1, n_repeat=1, input_tensors=None, backend='event', quantiles=None, return_mode='mean')¶

Benchmarks the execution time of a given function.

Parameters:

func (Callable | None) – Function to benchmark (uses adapter if None)
warmup (int) – Warmup time in milliseconds
rep (int) – Number of repetitions for timing
n_warmup (int) – Number of warmup iterations
n_repeat (int) – Number of timing iterations
profiler – Which profiling backend to use
input_tensors (list[torch.Tensor]) – Optional pre-generated input tensors
backend (Literal['event', 'cupti'])
quantiles (list[float] | None)
return_mode (Literal['min', 'max', 'mean', 'median'])

Returns:

Average execution time in milliseconds

Return type:

float

property func¶

__call__(*args, **kwds)¶

Parameters:

args (Any)
kwds (Any)

Return type:

Any