feat(reproduce): velocity reproduce — re-run an archived sweep by ajbarea · Pull Request #80 · ajbarea/velocity-fl

ajbarea · 2026-06-01T11:18:51Z

What

velocity reproduce <archive.zip> [--out] [--check] [--tolerance] — the inverse of velocity archive (#79), closing the reproducibility loop.

read_archive recovers the per-run RunSpecs + original comparison.json straight from the crate zip (no extraction needed).
reproduce_archive re-runs them via the existing run_sweep (DRY).
--check compares each run's reproduced final loss against the archived value within --tolerance and exits non-zero on a real mismatch.

Research-grounded decisions (2026-06)

Reproduction, not replication (ACM/NISO): re-running the same config + code is a "reproduction" → the command is reproduce.
Tolerance, not bit-exact: ML / float aggregation isn't bitwise reproducible across runs/hardware, so --check uses a relative tolerance (math.isclose), not equality — bit-exact would emit false failures. nan-safe: pydantic serializes an in-memory nan loss to JSON null, so a present run with null loss is read as nan and doesn't false-mismatch a reproduced nan; an absent run is a real mismatch. (This asymmetry was caught by round-trip testing, not assumed.)

Testing

TDD throughout: spec recovery, end-to-end re-run on the offline stub sweep, tolerance + nan logic (synthetic SweepResult), CLI happy-path + --check.
Documented in docs/cli.md — satisfying the roster guard (the lesson from feat(archive): velocity archive — RO-Crate reproducibility bundle #79).
Whole-repo make lint (ruff + ty) green; full suite 379 passed, 9 skipped; end-to-end CLI smoke (sweep → archive → reproduce --check) exits 0.

YAGNI: scoped to the archive↔reproduce loop; replication (new data) and DOI/Zenodo automation are deliberately out of scope.

The inverse of `velocity archive`, closing the reproducibility loop. read_archive recovers the per-run RunSpecs (and original comparison.json) straight from the crate zip; reproduce_archive re-runs them via the existing run_sweep (DRY). `velocity reproduce <archive.zip> [--out] [--check] [--tolerance]`. --check compares each run's reproduced final loss against the archived value within a relative tolerance and exits non-zero on a real mismatch. Tolerance-based, not bit-exact (ML/float aggregation isn't bitwise reproducible across runs/hardware) and nan-safe (pydantic serializes an in-memory nan loss to JSON null, so a present run with null loss is read as nan and doesn't false-mismatch a reproduced nan). Documented in docs/cli.md (roster guard). TDD throughout; full suite 379 passed. research(2026-06): ACM/NISO "reproduced" = same config+code re-executed; numerical non-determinism in ML -> tolerance not equality.

codspeed-hq · 2026-06-01T11:30:24Z

Merging this PR will not alter performance

✅ 41 untouched benchmarks
⏩ 10 skipped benchmarks¹

_{Comparing feat/reproduce (45b9d00) with main (d9b90f1)}

10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

ajbarea enabled auto-merge (squash) June 1, 2026 11:19

ajbarea disabled auto-merge June 1, 2026 11:20

docs(readme): list archive + reproduce in CLI reference + quickstart

45b9d00

ajbarea enabled auto-merge (squash) June 1, 2026 11:21

ajbarea merged commit eec467b into main Jun 1, 2026
5 checks passed

ajbarea deleted the feat/reproduce branch June 1, 2026 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reproduce): velocity reproduce — re-run an archived sweep#80

feat(reproduce): velocity reproduce — re-run an archived sweep#80
ajbarea merged 2 commits into
mainfrom
feat/reproduce

ajbarea commented Jun 1, 2026

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajbarea commented Jun 1, 2026

What

Research-grounded decisions (2026-06)

Testing

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 1, 2026

Merging this PR will not alter performance

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant