Skip to content

fix(backfill): filter universe writes by current constituents + escalate preflight#138

Merged
cipher813 merged 1 commit into
mainfrom
fix/backfill-respect-constituents
May 2, 2026
Merged

fix(backfill): filter universe writes by current constituents + escalate preflight#138
cipher813 merged 1 commit into
mainfrom
fix/backfill-respect-constituents

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Closes the prune+backfill loop that recreated 7 S&P churn-out stragglers on every Saturday SF run. 2026-05-02 redrive #6 surfaced the loop:

  1. Pre-MorningEnrich prune (PR fix(morning_enrich): refresh constituents + prune stragglers before write #134, absent_days=5) drops stragglers ✓
  2. Phase 1 step 8 (builders.backfill) loads ALL predictor/price_cache/*.parquet and writes EVERY ticker to ArcticDB universe — including the ones we just pruned, because their parquet files still exist (kept for historical lookup)
  3. Loop closes; Backtester preflight ~2 hours later trips on the 8-day-stale rows

Two fixes

builders/backfill.py: constituents-filter on writes

Load current constituents via the latest_weekly.json pointer and filter universe_tickers against it. Churn-out parquet preserved (history kept) but no arctic row written. If a ticker returns to S&P, backfill picks it back up automatically. Hard-fails on constituents-load failure (no silent recreate-everything fallback).

sf_preflight.check_universe_drift: WARN → FAIL

Now returns FAIL when any straggler is >5d stale. Forces operators to drop stragglers BEFORE launching recovery SFs that skip MorningEnrich (otherwise we burn a 120-min Backtester spot to re-discover them). Result includes remediation hint.

Validation against current state (post manual prune)

[OK]   arctic_connectivity              ArcticDB reachable; universe library has 904 symbols
[OK]   universe_drift                   1 arctic stragglers; 0 would be pruned, 1 too fresh to drop
[OK]   backfill_source_freshness        Backfill source (2026-05-01) ≥ arctic (2026-05-01)
... Predicted SF outcome: PASS

Test plan

  • 3 new tests in test_backfill_no_regression.py: loop closure, hard-fail on constituents-load failure, dry-run doesn't require S3
  • Existing 21 backfill tests still pass (mocks updated)
  • Updated test for check_universe_drift to expect FAIL on stragglers + remediation hint
  • Full suite: 406 passed
  • Live verification: next Saturday SF that runs MorningEnrich → prune drops stragglers → backfill respects + doesn't recreate → no Backtester preflight halt

Why not also add prune to Backtester?

Considered but rejected — the issue isn't Backtester missing a prune step, it's that the EXISTING DataPhase1 prune was getting undone by DataPhase1's own backfill. Fixing the loop in one place is cleaner than duplicating prune across modules.

🤖 Generated with Claude Code

…ate preflight

Closes the prune+backfill loop that recreated 7 S&P churn-out
stragglers on every Saturday SF run. 2026-05-02 redrive #6 surfaced
the loop: pre-MorningEnrich prune (PR #134, absent_days=5) drops
stragglers ✓; Phase 1 step 8 (builders.backfill) loads ALL
predictor/price_cache/*.parquet files and writes EVERY ticker back to
ArcticDB universe — including the ones we just pruned, because their
parquet files still exist (kept for historical lookup). Loop closes;
Backtester preflight (~2 hours later) trips on the 8-day-stale rows.

## Fix 1: backfill respects current constituents

In ``builders.backfill``, load current constituents via the
``market_data/latest_weekly.json`` pointer and filter
``universe_tickers`` against it. Tickers absent from constituents
(churn-outs) get a price_cache parquet preserved (history kept) but
NO arctic row written. If a ticker comes back to S&P later, it
appears in constituents and backfill picks it up automatically.

Hard-fails on constituents-load failure (vs silently writing
everything) per feedback_no_silent_fails. Skipped in dry_run so
local smoke tests don't need S3 access.

## Fix 2: sf_preflight escalates straggler detection

``check_universe_drift`` now returns FAIL (not OK) when any straggler
is "old enough to prune" (>5 days stale). Forces operators to drop
stragglers BEFORE launching recovery SFs that skip MorningEnrich
(would otherwise burn a 120-min Backtester spot to re-discover them).
Result includes a remediation hint pointing at the prune CLI.

Validation against current state (post manual prune of 7):
  [OK]   universe_drift     1 arctic stragglers; 0 would be pruned

3 new tests in test_backfill_no_regression.py:
- backfill_skips_tickers_absent_from_constituents (the loop closure)
- backfill_hard_fails_when_constituents_load_fails (no silent
  recreate-everything fallback)
- backfill_dry_run_does_not_filter_by_constituents (CI / smoke
  doesn't need S3)

Existing test scaffolding updated to mock _load_current_constituents
across both backfill test files. 406 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 2e5502b into main May 2, 2026
1 check passed
@cipher813 cipher813 deleted the fix/backfill-respect-constituents branch May 2, 2026 18:53
cipher813 added a commit that referenced this pull request May 6, 2026
Companion to alpha-engine-backtester PR #138 (concordance Lambda
implementation). Inserts the Saturday SF state that fires the
weekly cheap-model concordance pipeline.

SF chain after this PR:

  ... → EvalJudge{Weekly,FirstSaturday} → EvalRollingMean
      → CheckSkipRationaleClustering → RationaleClustering
      → CheckSkipReplayConcordance → ReplayConcordance
      → SaturdayHealthCheck → ...

Independent skip-gates per observability path:

- {"skip_eval_judge": true}        — bypass judge only
- {"skip_rationale_clustering":...} — bypass clustering only (was
  rerouted from SaturdayHealthCheck to CheckSkipReplayConcordance
  in this PR; the two are independent agent-justification signals)
- {"skip_replay_concordance": true} — bypass concordance only

Each Catch lands at SaturdayHealthCheck (eval observability —
failures must NOT halt the pipeline).

Default ReplayConcordance payload pins production cadence:

  - target_models: ["claude-haiku-4-5"] (Sonnet→Haiku concordance)
  - window_days: 56 (8 weeks trailing)
  - max_artifacts: 150 (fits 900s Lambda timeout at ~3-5 sec/call)

Operator overrides via SF input parameters for ad-hoc runs against
different target models (e.g. claude-sonnet-4-6 self-test).

IAM updates:

- github-actions-lambda-deploy.json: alpha-engine-replay-concordance
  added to LambdaUpdate + LambdaInvokeCanary (asymmetric-IAM-grant
  antipattern compliance — 4th time this shape, durable fix is the
  CreateFunction grant from PR #165 already covers create+update).
- deploy_step_function.sh: SF role inline LambdaInvoke list updated
  with the new function ARN so SF can invoke it.

Tests:

- TestStatesPresent: CheckSkipReplayConcordance + ReplayConcordance
  required.
- TestSkipRationaleClustering: rerouted assertion (now lands at
  CheckSkipReplayConcordance, not SaturdayHealthCheck).
- TestRationaleClustering: success + Catch reroutes (now CheckSkipReplay
  Concordance).
- TestSkipReplayConcordance: skip_replay_concordance flag → SaturdayHealthCheck.
- TestReplayConcordance: live alias, payload required fields
  (end_time_iso, target_models, window_days, max_artifacts), 900s
  timeout matches Lambda cap, success + Catch routes, retry posture.

Suite 451 → 459.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant