Skip to content

feat(preflight): add sf_preflight.py — Saturday SF dry-rehearsal#136

Merged
cipher813 merged 2 commits into
mainfrom
feat/sf-preflight
May 2, 2026
Merged

feat(preflight): add sf_preflight.py — Saturday SF dry-rehearsal#136
cipher813 merged 2 commits into
mainfrom
feat/sf-preflight

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Sample output (run against current state, post-merge of #130-#135)

[OK]   arctic_connectivity              ArcticDB reachable; universe library has 904 symbols
[OK]   constituents_fetch               Wikipedia OK: 903 tickers (503 S&P 500 + 400 S&P 400)
[OK]   universe_drift                   1 arctic stragglers; 0 would be pruned, 1 too fresh to drop
[OK]   universe_sample_freshness        Sampled 20 symbols, all within 5d of today
[WARN] polygon_grouped_coverage         POLYGON_API_KEY not set — skipped (will run on spot/EC2)
[WARN] predicted_missing_from_closes    Skipped: polygon check skipped (no API key locally)
[OK]   backfill_source_freshness        Backfill source (2026-05-01) ≥ arctic (2026-05-01)
[OK]   postflight_contracts             All postflight contract files present + parseable
 Predicted SF outcome: PASS with 2 warning(s)

Design notes (macOS workarounds)

  • ArcticDB libs initialized once in check_arctic_connectivity and reused via PreflightContext — re-initializing adb.Arctic() crashes Aws::S3::S3Client::S3Client on macOS.
  • Checks ordered with arctic_connectivity first so its bundled AWS SDK loads before boto3 (which gets pulled in by collectors imports).
  • Polygon skips gracefully (WARN, not FAIL) when POLYGON_API_KEY is unset — supports laptop preflight without the spot's .env.

Test plan

  • 18 tests in test_sf_preflight.py: happy path + each failure mode each check catches + orchestrator isolation
  • Full suite green: 394 passed
  • Manual run against current state predicts PASS — ready to redrive SF

Followups (not in this PR)

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 2, 2026 08:17
Predicts whether the Saturday SF would succeed BEFORE launching a spot.
Today's recovery cycle (5 SF redrives, ~5 polygon API calls each) burned
free-tier quota and operator hours discovering bugs sequentially. This
module simulates the critical pre-Phase-1 path against real S3 + ArcticDB
state and reports per-step pass/fail in ~30s with 1 polygon call total.

Eight independent checks, mapped to today's incident stack:

  PR #130 (backfill regression)         → check_backfill_source_freshness
  PR #131 (polygon coverage flake)      → check_polygon_grouped_coverage
  PR #132 (missing-from-closes scoping) → check_predicted_missing_from_closes
  PR #133 (freshness scan scoping)      → check_universe_sample_freshness
  PR #134 (workflow ordering)           → check_universe_drift
  PR #135 (return shape)                → check_constituents_fetch
  Postflight contracts                  → check_postflight_contracts
  ArcticDB reachability                 → check_arctic_connectivity

Each check is a pure function taking a PreflightContext, returning a
CheckResult. The orchestrator runs them all (catching per-check
exceptions so one fail doesn't abort the suite) and emits human or
JSON output. Exit code 1 on any failure.

Two macOS-specific design notes:

1. ArcticDB libs are initialized once in check_arctic_connectivity and
   reused across downstream checks via the context — re-initializing
   adb.Arctic() crashes Aws::S3::S3Client::S3Client on macOS.
2. Checks are ordered with arctic_connectivity FIRST so its bundled AWS
   SDK loads before boto3 (which gets pulled in by collectors imports).

Polygon check skips gracefully (WARN, not FAIL) when POLYGON_API_KEY
is unset — supports laptop-side preflight where the .env isn't loaded.
On the spot the key is present and the check fires.

18 tests in tests/test_sf_preflight.py — happy path + each failure mode
each check is designed to catch + orchestrator isolation.

394 tests total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI runs without POLYGON_API_KEY in env, so the no-key skip-to-WARN
guard short-circuited the 3 polygon-coverage tests before they
reached the mocked client. Set the env var via monkeypatch so the
guard passes through to the polygon mock. Also add explicit test
for the no-key path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 8dd8042 into main May 2, 2026
1 check passed
@cipher813 cipher813 deleted the feat/sf-preflight branch May 2, 2026 15:22
cipher813 added a commit that referenced this pull request May 28, 2026
…across all artifacts (#339)

Closes the gap surfaced 2026-05-28: current-state probe answers
'is the artifact present now?' but operators also need 'did it
land last weekend? are there gaps in the producer's history?'
Filed per the same feedback memory observe_mode_unconditional_gates
— absence-of-artifact is the failure mode, and a single-cycle
absence could be a false-positive where a multi-cycle gap is a
real producer regression.

Adds:

- event['mode']='historical' dispatch in handler(). Routes to a
  new _handle_historical(s3, now, started_at, lookback_overrides)
  path that walks the registry, probes the last N cycles per
  artifact, and writes _freshness_monitor/history.json (page 26
  will surface per-row history expanders + gap counts).
- New EB cron alpha-engine-freshness-monitor-historical-cron
  (daily 04:00 UTC, off-peak) wired in deploy.sh --bootstrap.
- Default lookback: 12 saturday_sf + 30 weekday_sf/eod_sf cycles
  (~3 months each). continuous skipped (current-state covers).
  Tunable via event['lookback'] override.

403/404/NoSuchKey normalization: S3 returns 403 (not 404) for
missing keys when the Lambda lacks s3:ListBucket. Treat both as
cleanly-absent (no error_code in output) so page 26 doesn't show
spurious '403 errors' on legitimately-absent historical cycles.

9 new unit tests cover: saturday/weekday/eod cycle-date
resolution, continuous skip, zero-count short-circuit,
date/trading_day/no-placeholder template rendering, and handler
mode-dispatch.

Live smoke (post-deploy + manual invoke):
  n_artifacts=51, n_cycles_probed=474, duration=10.08s

Surfaced 1 real finding for follow-up: several artifacts use
calendar-vs-trading-day-anchored templates that don't match
producer behavior. research_signals registered as
signals/{date}/signals.json with cadence=saturday_sf, but
producer writes to mostly Friday trading-day keys (2026-05-22,
2026-05-15, etc.). The historical probe correctly reports the
Saturday keys as absent — which IS the right answer given the
registry template. ROADMAP follow-up filed separately to audit
all registry templates for calendar-vs-trading-day mismatch.

Calendar-naive by design — NYSE holidays surface as
false-positive absent days but operators can interpret in
context. Calendar-aware backfill is a P3 follow-up if the
noise becomes worth the dependency lift.

Composes with the OBSERVATION_REGISTRY arc (#349/#351/#352/#355
+ #135/#136/#137).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant