Skip to content

Add LinearBatchEnsemble folding benchmark#23

Open
tippered1-debug wants to merge 1 commit into
yandex-research:mainfrom
tippered1-debug:benchmark-fold-linear-batchensemble
Open

Add LinearBatchEnsemble folding benchmark#23
tippered1-debug wants to merge 1 commit into
yandex-research:mainfrom
tippered1-debug:benchmark-fold-linear-batchensemble

Conversation

@tippered1-debug

Copy link
Copy Markdown

Adds a standalone benchmark for an inference-only folding of LinearBatchEnsemble.

The script materializes an equivalent LinearEnsemble by folding the BatchEnsemble scaling vectors into per-submodel weights:

folded.weight[k, i, o] = r[k, i] * weight[o, i] * s[k, o]

Before timing, it checks numerical equivalence with torch.testing.assert_close. Then it compares the original layer and the folded layer in eager and torch.compile modes.

This is kept outside the public API and does not change training behavior or the model implementation. The point is to make the latency/memory tradeoff measurable before considering any inference helper.

I ran:

.venv/bin/python -m py_compile benchmarks/benchmark_fold_linear_batchensemble.py
.venv/bin/python -m ruff check benchmarks/benchmark_fold_linear_batchensemble.py
.venv/bin/python -m ruff format --check benchmarks/benchmark_fold_linear_batchensemble.py
.venv/bin/python benchmarks/benchmark_fold_linear_batchensemble.py --device cpu --quick
.venv/bin/python benchmarks/benchmark_fold_linear_batchensemble.py --device mps --quick
.venv/bin/python benchmarks/benchmark_fold_linear_batchensemble.py --device all --quick --output results.json

On my local quick runs, folded eager was about 1.3x–1.7x faster on CPU and about 1.2x–2.6x faster on MPS for the tested shapes. Compiled timings were less stable, and I did not have CUDA available locally. The script also reports the extra parameter memory needed for materialized per-submodel weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant