EOD Step Function definition#5
Merged
Merged
Conversation
cipher813
added a commit
that referenced
this pull request
May 1, 2026
* feat(ci): wire deploy.yml + deploy-infrastructure.yml into system-wide changelog Adds a final step to both deploy workflows that calls the append-changelog composite action in alpha-engine-docs. Each successful (or failed) deploy now emits one JSON to s3://alpha-engine-research/changelog/. Two distinct entries per merge that touches both surfaces: - deploy.yml → Phase 2 Lambda image rebuild + alias bump - deploy-infrastructure.yml → SF + CF stamp re-deploy Distinguished by the deploy_workflow field on each entry, so the materialized CHANGELOG.md can show both as separate items under the same SHA. Uses if: always() + ternary on job.status so failed deploys also register in the log — the failure signal is itself a useful provenance record. Companion: alpha-engine-docs PR #3 (composite action + aggregator), alpha-engine-data PR #120 (IAM grant — already merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(orchestration): SNS→S3 changelog incident mirror Lambda Adds a small Lambda subscribed to the alpha-engine-alerts SNS topic that mirrors every alert as one JSON entry under s3://alpha-engine-research/changelog/incidents/. Closes the event- mining loop alongside the deploy-side log: now both "what shipped" and "what failed" feed the same time-ordered changelog. Why The 2026-05-01 weekday SF timeout cascade is the canonical example. The deploy log records the 4 PRs that fixed it, but it never captured the original SNS alert email at 06:01 PT — the failure event itself. With this Lambda, that alert would have landed at changelog/incidents/2026/05/01T13-01-XX_alpha-engine-alerts_*.json with full subject + body, queryable months later for retro mining ("show me every SF failure incident this quarter"). Resources added (4) - ChangelogIncidentMirrorRole — minimal: PutObject scoped to changelog/incidents/* + AWSLambdaBasicExecutionRole for logs. - ChangelogIncidentMirrorFunction — python3.12, arm64, 256 MB, 30s timeout. Inline ZipFile (~50 lines). Reads SNS Records, builds a JSON entry, S3 PutObject. No-ops cleanly on malformed timestamps (falls back to "now"). - ChangelogIncidentMirrorSubscription — SNS subscription on AlertsTopic with Protocol: lambda. - ChangelogIncidentMirrorPermission — Lambda::Permission letting SNS invoke the function. Schema (matches the deploy-side action's event_type discriminator) { "ts_utc": ..., "event_type": "incident", "source": "alpha-engine-alerts", "subject": "...", "summary": "...", // first 240 chars of subject or message line 1 "details": "...", // full message body "sns_message_id": "...", "topic_arn": "..." } Apply state Already applied live via aws cloudformation execute-change-set; smoke-tested with one SNS publish — entry landed at s3://alpha-engine-research/changelog/incidents/2026/05/01T15-52-57_* within 2s, schema validated, then cleaned up. This PR is the codification of the source-of-truth template. Companions - alpha-engine-docs PR #5 (event_type schema + aggregator support) - Future: flow-doctor S3 notifier, manual CLI helper. Note on template description The template's docstring says "Does NOT manage Lambda functions or IAM roles." Strictly we now manage one of each — narrow exception for the SNS-mirror because it's tightly coupled to AlertsTopic defined here. Not updating the doc this commit; will revisit if a second exception lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
cipher813
added a commit
that referenced
this pull request
May 2, 2026
PR #134's pre-MorningEnrich preflight calls ``constituents.collect()`` and reads ``cons_result.get("tickers", [])`` to feed prune_delisted_tickers' ``constituents_override`` and the daily_closes request list. But collect() was returning only ``{"status": "ok", "count": N}`` — no tickers — so the preflight got [] and MorningEnrich aborted with "No tickers available". 2026-05-02 SF redrive #5 was the live failure: prune correctly dropped all 8 stragglers (architectural fix worked\!), but then no tickers got fed to daily_closes. The whole MorningEnrich step exited 1. Add ``tickers`` to both happy-path returns (ok + ok_dry_run). Additive, no breakage: - ``_run_phase1`` (the only other caller) previously round-tripped to S3 to re-read what it just wrote — now uses ``const_result["tickers"]`` directly. - The dry-run fork in _run_phase1 (which separately called the private ``_fetch_constituents``) is also collapsed. Contract test in tests/test_constituents_sector_map.py locks both return shapes — sneaks-back protection for this exact regression class. 376 tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 18, 2026
…y path) (#259) ROADMAP "Friday shell-run — per-module dry-path activation" owed-item #1. Under the Friday shell_run, the DataPhase1/MorningEnrich + RAGIngestion spot states now boot the spot for real, run their EXISTING preflight, then exit 0 with ZERO external API data fetch and ZERO S3/ArcticDB/config/email/SNS writes — catching bootstrap-class breakage (lib-pin drift, sys.path collision, stale ArcticDB symbol, SSM timeout, Dockerfile/image gap) ~12h before the real Saturday run. Reuses the existing preflight substrate; no parallel preflight written. Where the gate sits / zero-fetch zero-write proof: - weekly_collector.py: new `--preflight-only` argparse flag. main() exits HERE — `raise SystemExit(0)` immediately after the existing `DataPreflight(config["bucket"], mode).run()` and strictly BEFORE `run_weekly(config, args)`. run_weekly() is the SOLE function in the module that performs ANY collector fetch (polygon/FMP/FRED/yfinance) or ANY S3/ArcticDB/parquet/config/module-health write — gating in front of it makes every fetch/write code path statically unreachable. The preflight itself only does read-only/auth probes (S3 HEAD, polygon/FRED reference-data auth calls that fetch no collector data, ArcticDB list_libraries) plus a self-cleaning S3 PUT+DELETE sentinel under preflight/ (the preflight's own liveness probe, not a data write). Ordering pinned by an AST-source test. - rag/pipelines/run_weekly_ingestion.sh: new `--preflight-only` flag. Exits 0 after Step 0 (`python -m rag.preflight`: check_env_vars + check_s3_bucket HEAD — read-only, zero fetch, zero write) and strictly BEFORE Step 1 (ingest_sec_filings). Every ingest_* pipeline, Voyage embedding call, and Postgres/pgvector + parquet write lives in Steps 1-9 — all unreachable once the guard exits. - infrastructure/spot_data_weekly.sh: new `--preflight-only` flag sets PREFLIGHT_ONLY=1, a MODIFIER orthogonal to RUN_MODE so it composes with the data path AND --rag-only. A dedicated data-path block runs `weekly_collector.py --morning-enrich --preflight-only` and/or `weekly_collector.py --phase 1 --preflight-only` (gated by the existing DO_MORNING_ENRICH/DO_PHASE1 split) then exit 0 before the real WORKLOADS heredoc — no prune (prune-audit JSON write), no RAG, no CloudWatch heartbeat, no S3 log upload. --rag-only --preflight-only behavior: runs ONLY the RAG-path preflight (boot + SSM secret fetch so rag.preflight's check_env_vars sees them + `run_weekly_ingestion.sh --preflight-only` = step-0-only + exit 0). No real RAG ingestion, no rag-ingestion heartbeat. `--preflight-only` alone runs ONLY the DataPhase1/MorningEnrich preflight. Universe-freshness tolerance note (ROADMAP owed-item #5): the Friday shell-run uses the phase1 / morning_enrich preflight modes. Per preflight.py::DataPreflight.run, NEITHER mode runs check_arcticdb_fresh — they only do _check_arcticdb_libraries_present (a presence read, not a freshness gate). morning_enrich deliberately omits freshness (it is part of what *makes* ArcticDB fresh); phase1 *populates* ArcticDB. The only freshness gate (check_arcticdb_fresh macro/SPY 4d) lives in the "daily" mode, which the Saturday/Friday data path never selects. So a Friday run predating Friday's settled polygon aggregate does NOT spuriously fail on a Thursday-last-bar — no --preflight-only-scoped tolerance code is required for the data path. Documented inline so a future mode-mapping change re-audits this invariant. Tests: new tests/test_preflight_only_dry_path.py (10 tests, static greps + AST-source assertions, matching the existing test_spot_data_weekly_run_modes.py / test_weekly_collector_preflight_ mode_mapping.py convention) pins: flag parsing on all 3 files, the exit-0-after-preflight-before-fetch/write ordering invariant, --rag-only --preflight-only step-0-only behavior, and the no-prune/no-RAG/no-heartbeat/no-S3-upload hard invariant. Full suite: 1229 passed, 1 skipped (pre-existing). bash -n clean on both shell scripts. No new deps, no secrets. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Deployment
After merge, create the Step Function in AWS:
aws stepfunctions create-state-machine --name alpha-engine-eod-pipeline --definition file://infrastructure/step_function_eod.json --role-arn arn:aws:iam::711398986525:role/alpha-engine-step-function-role
Test plan