tilelang.profiler¶

The profiler and convert to torch utils

Submodules¶

Classes¶

Profiler

A profiler class for benchmarking and validating kernel implementations.

Package Contents¶

class tilelang.profiler.Profiler¶

A profiler class for benchmarking and validating kernel implementations.

params¶

List of kernel parameters defining the input/output specifications

result_idx¶

Indices indicating which parameters are output tensors

supply_type¶

Type of tensor supply to use (e.g., random, zeros, etc.)

adapter¶

Optional kernel adapter for interfacing with different backends

params: list[tilelang.engine.param.KernelParam]¶
result_idx: list[int]¶
supply_type: tilelang.utils.tensor.TensorSupplyType¶
adapter: tilelang.jit.adapter.BaseKernelAdapter | None = None¶
__post_init__()¶

Initialize tensor supply after dataclass initialization

with_default_adapter(adapter)¶
Parameters:

adapter (tilelang.jit.adapter.BaseKernelAdapter)

Return type:

Profiler

assert_allclose(reference_program, input_tensors=None, atol=0.01, rtol=0.01, max_mismatched_ratio=0.01)¶

Validates kernel output against a reference implementation.

Parameters:
  • reference_program (Callable) – Reference implementation to compare against

  • input_tensors (list[torch.Tensor] | None) – Optional pre-generated input tensors

  • atol (float) – Absolute tolerance for comparison

  • rtol (float) – Relative tolerance for comparison

  • max_mismatched_ratio – Maximum allowed ratio of mismatched elements

manual_assert_close(reference_program, input_tensors=None, manual_check_prog=None)¶

Validates kernel output against a reference implementation.

Parameters:
  • reference_program (Callable) – Reference implementation to compare against

  • input_tensors (list[torch.Tensor] | None) – Optional pre-generated input tensors

  • atol – Absolute tolerance for comparison

  • rtol – Relative tolerance for comparison

  • max_mismatched_ratio – Maximum allowed ratio of mismatched elements

  • manual_check_prog (Callable)

assert_consistent(repeat=10)¶

Checks for kernel consistency across multiple runs.

Parameters:

repeat – Number of times to repeat the consistency check

run_once(func=None)¶
Parameters:

func (Callable | None)

do_bench(func=None, warmup=25, rep=100, n_warmup=1, n_repeat=1, input_tensors=None, backend='event', quantiles=None, return_mode='mean')¶

Benchmarks the execution time of a given function.

Parameters:
  • func (Callable | None) – Function to benchmark (uses adapter if None)

  • warmup (int) – Warmup time in milliseconds

  • rep (int) – Number of repetitions for timing

  • n_warmup (int) – Number of warmup iterations

  • n_repeat (int) – Number of timing iterations

  • profiler – Which profiling backend to use

  • input_tensors (list[torch.Tensor]) – Optional pre-generated input tensors

  • backend (Literal['event', 'cupti'])

  • quantiles (list[float] | None)

  • return_mode (Literal['min', 'max', 'mean', 'median'])

Returns:

Average execution time in milliseconds

Return type:

float

property func¶
__call__(*args, **kwds)¶
Parameters:
  • args (Any)

  • kwds (Any)

Return type:

Any