Skip to content

feat(reproduce): velocity reproduce — re-run an archived sweep#80

Merged
ajbarea merged 2 commits into
mainfrom
feat/reproduce
Jun 1, 2026
Merged

feat(reproduce): velocity reproduce — re-run an archived sweep#80
ajbarea merged 2 commits into
mainfrom
feat/reproduce

Conversation

@ajbarea
Copy link
Copy Markdown
Owner

@ajbarea ajbarea commented Jun 1, 2026

What

velocity reproduce <archive.zip> [--out] [--check] [--tolerance] — the inverse of velocity archive (#79), closing the reproducibility loop.

  • read_archive recovers the per-run RunSpecs + original comparison.json straight from the crate zip (no extraction needed).
  • reproduce_archive re-runs them via the existing run_sweep (DRY).
  • --check compares each run's reproduced final loss against the archived value within --tolerance and exits non-zero on a real mismatch.

Research-grounded decisions (2026-06)

  • Reproduction, not replication (ACM/NISO): re-running the same config + code is a "reproduction" → the command is reproduce.
  • Tolerance, not bit-exact: ML / float aggregation isn't bitwise reproducible across runs/hardware, so --check uses a relative tolerance (math.isclose), not equality — bit-exact would emit false failures. nan-safe: pydantic serializes an in-memory nan loss to JSON null, so a present run with null loss is read as nan and doesn't false-mismatch a reproduced nan; an absent run is a real mismatch. (This asymmetry was caught by round-trip testing, not assumed.)

Testing

  • TDD throughout: spec recovery, end-to-end re-run on the offline stub sweep, tolerance + nan logic (synthetic SweepResult), CLI happy-path + --check.
  • Documented in docs/cli.md — satisfying the roster guard (the lesson from feat(archive): velocity archive — RO-Crate reproducibility bundle #79).
  • Whole-repo make lint (ruff + ty) green; full suite 379 passed, 9 skipped; end-to-end CLI smoke (sweep → archive → reproduce --check) exits 0.

YAGNI: scoped to the archive↔reproduce loop; replication (new data) and DOI/Zenodo automation are deliberately out of scope.

The inverse of `velocity archive`, closing the reproducibility loop.
read_archive recovers the per-run RunSpecs (and original comparison.json)
straight from the crate zip; reproduce_archive re-runs them via the existing
run_sweep (DRY).

`velocity reproduce <archive.zip> [--out] [--check] [--tolerance]`. --check
compares each run's reproduced final loss against the archived value within a
relative tolerance and exits non-zero on a real mismatch. Tolerance-based, not
bit-exact (ML/float aggregation isn't bitwise reproducible across runs/hardware)
and nan-safe (pydantic serializes an in-memory nan loss to JSON null, so a
present run with null loss is read as nan and doesn't false-mismatch a
reproduced nan).

Documented in docs/cli.md (roster guard). TDD throughout; full suite 379 passed.

research(2026-06): ACM/NISO "reproduced" = same config+code re-executed;
numerical non-determinism in ML -> tolerance not equality.
@ajbarea ajbarea enabled auto-merge (squash) June 1, 2026 11:19
@ajbarea ajbarea disabled auto-merge June 1, 2026 11:20
@ajbarea ajbarea enabled auto-merge (squash) June 1, 2026 11:21
@ajbarea ajbarea merged commit eec467b into main Jun 1, 2026
5 checks passed
@ajbarea ajbarea deleted the feat/reproduce branch June 1, 2026 11:23
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Jun 1, 2026

Merging this PR will not alter performance

✅ 41 untouched benchmarks
⏩ 10 skipped benchmarks1


Comparing feat/reproduce (45b9d00) with main (d9b90f1)

Open in CodSpeed

Footnotes

  1. 10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant