tilelang.profiler.bench ======================= .. py:module:: tilelang.profiler.bench .. autoapi-nested-parse:: Profiler and benchmarking utilities for PyTorch functions. Attributes ---------- .. autoapisummary:: tilelang.profiler.bench.IS_CUDA tilelang.profiler.bench.device tilelang.profiler.bench.Event Classes ------- .. autoapisummary:: tilelang.profiler.bench.suppress_stdout_stderr Functions --------- .. autoapisummary:: tilelang.profiler.bench.do_bench Module Contents --------------- .. py:class:: suppress_stdout_stderr Context manager to suppress stdout and stderr output. Source: https://github.com/deepseek-ai/DeepGEMM/blob/main/deep_gemm/testing/bench.py .. py:method:: __enter__() .. py:method:: __exit__(*_) .. py:data:: IS_CUDA .. py:data:: device :value: 'cuda:0' .. py:data:: Event .. py:function:: do_bench(fn, warmup = 25, rep = 100, _n_warmup = 0, _n_repeat = 0, quantiles = None, fast_flush = True, backend = 'event', return_mode = 'mean') Benchmark the runtime of a PyTorch function with L2 cache management. This function provides accurate GPU kernel timing by: - Clearing L2 cache between runs for consistent measurements - Auto-calculating warmup and repeat counts based on kernel runtime - Supporting multiple profiling backends (CUDA events or CUPTI) - Offering flexible result aggregation (mean/median/min/max/quantiles) :param fn: Function to benchmark :param warmup: Target warmup time in milliseconds (default: 25) :param rep: Target total benchmark time in milliseconds (default: 100) :param _n_warmup: Manual override for warmup iterations (default: 0 = auto) :param _n_repeat: Manual override for benchmark iterations (default: 0 = auto) :param quantiles: Performance percentiles to compute (e.g., [0.5, 0.95]) :param fast_flush: Use faster L2 cache flush with int32 vs int8 (default: True) :param backend: Profiler backend - "event" (CUDA events) or "cupti" (default: "event") :param return_mode: Result aggregation method - "mean", "median", "min", or "max" :returns: Runtime in milliseconds (float) or list of quantile values if quantiles specified