Skip to content

tr_tst2 baseline: backfill AJRFT after eq_tst2 drift fix #190

@k-yoshimi

Description

@k-yoshimi

Summary

test_run/baselines/tr_tst2/metrics.json is missing the AJRFT scalar key. After commit 24b1b12f ("test+tr: backfill AJRFT in regression dump") + the libtrapi.so Python wrapper update from PR #187, libtrapi.so emits 14 scalars (including AJRFT) but the tst2 baseline still has 13. Whenever eq_tst2 is fixed and tr_tst2 runs in CI, the Layer-1 equivalence test will fail with:

compare_metrics FAIL: scalars.AJRFT: missing

Why this wasn't fixed in the original AJRFT PR

tr_tst2 depends on eq_tst2, which has its own pre-existing ~3e-9 physics drift on the current develop branch (eq_tst2 regression on clavius 2026-05-10 — likely upstream eq changes since the baseline was last regenerated). Regenerating the tr_tst2 baseline requires either:

  • Resolving the eq_tst2 drift first, OR
  • Manually patching just the AJRFT: 0.0 line (risky — assumes the actual value is 0; in-house code review on commit 24b1b12f explicitly cautioned against this)

The choice was made to leave tst2 with the existing stale baseline (which is the status quo — the test SKIPs locally due to missing eqdata, hiding the issue) and address it as a separate concern.

Acceptance

  1. Resolve the eq_tst2 drift (separate work — investigate which develop change introduced the ~3e-9 shift; eq physics or eq baselines stale?)
  2. Regenerate the eq_tst2 baseline on a Linux box with current eq (see \$CLAUDE_MEMORY_DIR/memory/reference_clavius_baseline_regen.md)
  3. Then regenerate the tr_tst2 baseline (will include AJRFT automatically via the fix in 24b1b12f)
  4. Verify pytest python/trlib/tests/test_equivalence.py --forked --timeout=120 --timeout-method=signal -v reports both test_iter01 AND test_tst2 PASSED at 1e-10 — no SKIP

Related

🤖 Filed via post-push retrospective on 2026-05-11.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions