Add preflight checks to weekly_collector entrypoint#27
Closed
cipher813 wants to merge 1 commit into
Closed
Conversation
Addresses the class of failure surfaced 2026-04-14: daily_append silently not writing to ArcticDB for two weekdays because arcticdb wasn't in the deploy image and the import error was swallowed. A preflight check catches this in ~1s instead of letting the pipeline run to "success" with stale data. Pattern D (simplest): inline _preflight() in weekly_collector.py, called from main() after config load, before run_weekly(). No new files, no shared library. If the check pattern proves valuable across the other modules, we can extract a common helper later — but the per-repo checks are small enough (~30 LOC) that inlining is fine for now. Scoped to external-world handshakes (env vars, S3, ArcticDB) — NOT correctness of the collection itself. The hardened collectors from PRs #24 + #25 still own data-integrity hard-fails. Checks by mode - daily: AWS_REGION env, S3 bucket reachable, ArcticDB universe library readable, SPY freshness ≤ 4 days (covers Fri → Tue long weekend + buffer). 4-day stale SPY would have caught today's bug on 2026-04-14 instead of letting Friday's write look healthy until Saturday. - phase1: AWS_REGION + FRED_API_KEY + POLYGON_API_KEY + S3 reachable. - phase2: AWS_REGION + FMP_API_KEY + EDGAR_IDENTITY + S3 reachable. Failures raise RuntimeError. main() already exits 1 on any SystemExit path, and flow-doctor (#26, once deployed) will dispatch the corresponding ERROR log as email + GitHub issue. Out of scope (tracked) - Same pattern in predictor inference + training, research Lambda, executor entrypoints, backtester. Rolling out after this first consumer proves the shape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owner
Author
|
Superseded by the shared-library approach. Preflight logic moved to |
cipher813
added a commit
that referenced
this pull request
Apr 14, 2026
Replaces the inline log_config.py (committed 2026-04-14 in PR #26) and the proposed-then-closed inline _preflight (PR #27) with the shared library. alpha-engine-lib is now the single source of truth for these cross-cutting concerns; drift across consumer repos becomes impossible by construction. Changes - requirements.txt: add alpha-engine-lib @ git+…@v0.1.0 with [arcticdb,flow_doctor] extras. Drop the direct flow-doctor pin — pulled in transitively by the extra. - preflight.py: new module with DataPreflight(BasePreflight). Composes mode-specific check sequences (daily / phase1 / phase2) on top of the shared primitives. ~40 LOC. - weekly_collector.py: - Import setup_logging from alpha_engine_lib.logging (was local log_config). Pass flow-doctor.yaml path explicitly since the lib version is path-parametric (each consumer has its own yaml). - Call DataPreflight(...).run() at the top of main(), after config load, before run_weekly(). - log_config.py: deleted. The lib version is now the sole copy. Test plan - [x] pytest tests/ — 41 pass - [x] Import smoke: `from preflight import DataPreflight` works against locally-installed alpha-engine-lib v0.1.0 - [ ] EC2: pip install -r requirements.txt must succeed. The git+ URL requires the existing ~/.netrc PAT to have Contents: read on cipher813/alpha-engine-lib. If it doesn't, the install fails loud and visibly (no silent fall-through). - [ ] Next weekday DailyData run: confirm "Pre-flight OK (mode=daily)" appears before "COLLECTING: daily closes". - [ ] Forced failure test: `FRED_API_KEY= python weekly_collector.py --phase 1 --dry-run` should raise at preflight. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses the class of failure surfaced 2026-04-14: daily_append silently not writing to ArcticDB for two weekdays because `arcticdb` wasn't in the deploy image and the import error was swallowed. A preflight check catches this in ~1s instead of letting the pipeline run to "success" with stale data.
Pattern D (simplest): inline `_preflight()` in `weekly_collector.py`, called from `main()` after config load, before `run_weekly()`. No new files, no shared library. If the check pattern proves valuable across the other modules, we can extract a common helper later — but the per-repo checks are small enough (~30 LOC) that inlining is fine for now.
Scoped to external-world handshakes (env vars, S3, ArcticDB) — not correctness of the collection itself. The hardened collectors from #24 + #25 still own data-integrity hard-fails.
Checks by mode
The 4-day SPY staleness threshold covers Fri → Tue long-weekend runs with a 1-day buffer, and would have caught today's bug (ArcticDB last wrote 2026-04-12; preflight on 2026-04-14 would have computed age=2d and passed, on 2026-04-15 age=3d still passes, on 2026-04-16 age=4d still passes, on 2026-04-17 age=5d fires). I picked 4 deliberately — want the buffer to absorb long weekends but not a full week of silent failure. Open to tightening to 2–3 days if false positives on long weekends prove rare.
Failure path
`_preflight` raises `RuntimeError` → caller propagates → `main()` exits non-zero → SSM command exits non-zero (set -eo pipefail from #24) → Step Function `Catch [States.ALL] → HandleFailure` fires. Once #26 is deployed, flow-doctor dispatches email + GitHub issue for the ERROR log.
Out of scope (tracked)
Test plan
🤖 Generated with Claude Code