fix(leaderboard): record strategy hyperparameters in the run fingerprint by ajbarea · Pull Request #82 · ajbarea/velocity-fl

ajbarea · 2026-06-01T13:34:18Z

The bug (latent leaderboard-correctness gap)

Both DB producers built the run config with config["strategy"] = strategy_name(strat) — only the strategy name. So a strategy's hyperparameters (Krum f, FedProx mu, TrimmedMean k, MultiKrum/Bulyan f/m, …) were never recorded. config_fingerprint's docstring claims "strategy … their params … participates in the hash", but they couldn't — they weren't stored.

Consequence: Krum f=2 and f=3 share a fingerprint, so every per-fingerprint board (accuracy, wall-clock, comm-cost, pareto, robustness) silently averages different experiments into one row. It also left hyperparameter_sage (roadmapped) with no hyperparameter data to mine.

Found while picking up hyperparameter_sage — verifying the data foundation before building on it.

The fix

strategy.strategy_params(strat) -> dict (mirrors strategy_name; dataclass fields → dict; {} for paramless FedAvg/FedMedian/ArKrum, so their fingerprints are unchanged — no churn).
Extract mcp_app._real_run_config(strategy, **fields) from the heavy _execute_real_training, recording both the name (for the runs.strategy column + display) and strategy_params. The existing canonical-JSON config_fingerprint then resolves params with no fingerprint-logic change. Extracting it also makes the fix unit-testable without running real training.
Only run_real_training is touched — run_demo takes a strategy string and parse_strategy can't parameterise a bare name (Krum needs f → falls back to FedAvg), so it's paramless-only: no bug, no change (YAGNI).
Corrected the config_fingerprint docstring to state the mechanism.

No MCP tool-signature or CLI change → no surface-hash re-pin, no docs catalog churn.

Testing

TDD: strategy_params over every strategy shape; _real_run_config records params and Krum f=2 ≠ f=3 fingerprints (the regression guard for exactly this bug).
Whole-repo make lint (ruff + ty) green; full suite 381 passed, 9 skipped.

Research

The Sage guard-rails this unblocks (quote sample size + variance, hard-fail on too-few seeds) are statistically grounded: hyperparameter-choice variance markedly affects ML benchmark results (arXiv:2103.03098). hyperparameter_sage itself stays on the ROADMAP — its "matched runs" semantics + thresholds are a design pass worth a human eye.

The DB producers stored only the strategy name (strategy_name), so a strategy's hyperparameters (Krum f, FedProx mu, TrimmedMean k, ...) never reached the run config. config_fingerprint — despite its docstring — couldn't distinguish them, so Krum f=2 and f=3 shared a fingerprint and every per-fingerprint leaderboard silently averaged different experiments into one row. Add strategy.strategy_params() (dataclass fields -> dict; {} for paramless strategies, so their fingerprints are unchanged) and extract mcp_app._real_run_config, which records both the name (for the runs.strategy column) and strategy_params so the existing canonical-JSON fingerprint resolves them. Only run_real_training carries parameterised strategies (run_demo's string input is paramless-only, so it has no bug and is unchanged). No MCP/CLI surface change. Unblocks the roadmapped hyperparameter_sage.

codspeed-hq · 2026-06-01T13:42:41Z

Merging this PR will not alter performance

✅ 41 untouched benchmarks
⏩ 10 skipped benchmarks¹

_{Comparing feat/strategy-params-fingerprint (ac0c279) with main (eec467b)}

10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

ajbarea enabled auto-merge (squash) June 1, 2026 13:34

ajbarea merged commit 6f26cc3 into main Jun 1, 2026
6 checks passed

ajbarea deleted the feat/strategy-params-fingerprint branch June 1, 2026 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(leaderboard): record strategy hyperparameters in the run fingerprint#82

fix(leaderboard): record strategy hyperparameters in the run fingerprint#82
ajbarea merged 1 commit into
mainfrom
feat/strategy-params-fingerprint

ajbarea commented Jun 1, 2026

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajbarea commented Jun 1, 2026

The bug (latent leaderboard-correctness gap)

The fix

Testing

Research

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 1, 2026

Merging this PR will not alter performance

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant