EOD Step Function definition by cipher813 · Pull Request #5 · cipher813/alpha-engine-data

cipher813 · 2026-04-07T22:26:27Z

Summary

New Step Function: alpha-engine-eod-pipeline
Triggered by daemon shutdown (not cron/timer)
PostMarketData (micro EC2) → EODReconcile (trading EC2) → StopInstance
PostMarketData failure is non-blocking (EOD falls back to IB Gateway prices)
Instance always stops, even on failure (ForceStopInstance fallback)

Deployment

After merge, create the Step Function in AWS:
aws stepfunctions create-state-machine --name alpha-engine-eod-pipeline --definition file://infrastructure/step_function_eod.json --role-arn arn:aws:iam::711398986525:role/alpha-engine-step-function-role

Test plan

Step Function created in AWS console
Daemon triggers it on shutdown
Full pipeline completes: PostMarketData → EOD → StopInstance

…ance stop

* feat(ci): wire deploy.yml + deploy-infrastructure.yml into system-wide changelog Adds a final step to both deploy workflows that calls the append-changelog composite action in alpha-engine-docs. Each successful (or failed) deploy now emits one JSON to s3://alpha-engine-research/changelog/. Two distinct entries per merge that touches both surfaces: - deploy.yml → Phase 2 Lambda image rebuild + alias bump - deploy-infrastructure.yml → SF + CF stamp re-deploy Distinguished by the deploy_workflow field on each entry, so the materialized CHANGELOG.md can show both as separate items under the same SHA. Uses if: always() + ternary on job.status so failed deploys also register in the log — the failure signal is itself a useful provenance record. Companion: alpha-engine-docs PR #3 (composite action + aggregator), alpha-engine-data PR #120 (IAM grant — already merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(orchestration): SNS→S3 changelog incident mirror Lambda Adds a small Lambda subscribed to the alpha-engine-alerts SNS topic that mirrors every alert as one JSON entry under s3://alpha-engine-research/changelog/incidents/. Closes the event- mining loop alongside the deploy-side log: now both "what shipped" and "what failed" feed the same time-ordered changelog. Why The 2026-05-01 weekday SF timeout cascade is the canonical example. The deploy log records the 4 PRs that fixed it, but it never captured the original SNS alert email at 06:01 PT — the failure event itself. With this Lambda, that alert would have landed at changelog/incidents/2026/05/01T13-01-XX_alpha-engine-alerts_*.json with full subject + body, queryable months later for retro mining ("show me every SF failure incident this quarter"). Resources added (4) - ChangelogIncidentMirrorRole — minimal: PutObject scoped to changelog/incidents/* + AWSLambdaBasicExecutionRole for logs. - ChangelogIncidentMirrorFunction — python3.12, arm64, 256 MB, 30s timeout. Inline ZipFile (~50 lines). Reads SNS Records, builds a JSON entry, S3 PutObject. No-ops cleanly on malformed timestamps (falls back to "now"). - ChangelogIncidentMirrorSubscription — SNS subscription on AlertsTopic with Protocol: lambda. - ChangelogIncidentMirrorPermission — Lambda::Permission letting SNS invoke the function. Schema (matches the deploy-side action's event_type discriminator) { "ts_utc": ..., "event_type": "incident", "source": "alpha-engine-alerts", "subject": "...", "summary": "...", // first 240 chars of subject or message line 1 "details": "...", // full message body "sns_message_id": "...", "topic_arn": "..." } Apply state Already applied live via aws cloudformation execute-change-set; smoke-tested with one SNS publish — entry landed at s3://alpha-engine-research/changelog/incidents/2026/05/01T15-52-57_* within 2s, schema validated, then cleaned up. This PR is the codification of the source-of-truth template. Companions - alpha-engine-docs PR #5 (event_type schema + aggregator support) - Future: flow-doctor S3 notifier, manual CLI helper. Note on template description The template's docstring says "Does NOT manage Lambda functions or IAM roles." Strictly we now manage one of each — narrow exception for the SNS-mirror because it's tightly coupled to AlertsTopic defined here. Not updating the doc this commit; will revisit if a second exception lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #134's pre-MorningEnrich preflight calls ``constituents.collect()`` and reads ``cons_result.get("tickers", [])`` to feed prune_delisted_tickers' ``constituents_override`` and the daily_closes request list. But collect() was returning only ``{"status": "ok", "count": N}`` — no tickers — so the preflight got [] and MorningEnrich aborted with "No tickers available". 2026-05-02 SF redrive #5 was the live failure: prune correctly dropped all 8 stragglers (architectural fix worked\!), but then no tickers got fed to daily_closes. The whole MorningEnrich step exited 1. Add ``tickers`` to both happy-path returns (ok + ok_dry_run). Additive, no breakage: - ``_run_phase1`` (the only other caller) previously round-tripped to S3 to re-read what it just wrote — now uses ``const_result["tickers"]`` directly. - The dry-run fork in _run_phase1 (which separately called the private ``_fetch_constituents``) is also collapsed. Contract test in tests/test_constituents_sector_map.py locks both return shapes — sneaks-back protection for this exact regression class. 376 tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y path) (#259) ROADMAP "Friday shell-run — per-module dry-path activation" owed-item #1. Under the Friday shell_run, the DataPhase1/MorningEnrich + RAGIngestion spot states now boot the spot for real, run their EXISTING preflight, then exit 0 with ZERO external API data fetch and ZERO S3/ArcticDB/config/email/SNS writes — catching bootstrap-class breakage (lib-pin drift, sys.path collision, stale ArcticDB symbol, SSM timeout, Dockerfile/image gap) ~12h before the real Saturday run. Reuses the existing preflight substrate; no parallel preflight written. Where the gate sits / zero-fetch zero-write proof: - weekly_collector.py: new `--preflight-only` argparse flag. main() exits HERE — `raise SystemExit(0)` immediately after the existing `DataPreflight(config["bucket"], mode).run()` and strictly BEFORE `run_weekly(config, args)`. run_weekly() is the SOLE function in the module that performs ANY collector fetch (polygon/FMP/FRED/yfinance) or ANY S3/ArcticDB/parquet/config/module-health write — gating in front of it makes every fetch/write code path statically unreachable. The preflight itself only does read-only/auth probes (S3 HEAD, polygon/FRED reference-data auth calls that fetch no collector data, ArcticDB list_libraries) plus a self-cleaning S3 PUT+DELETE sentinel under preflight/ (the preflight's own liveness probe, not a data write). Ordering pinned by an AST-source test. - rag/pipelines/run_weekly_ingestion.sh: new `--preflight-only` flag. Exits 0 after Step 0 (`python -m rag.preflight`: check_env_vars + check_s3_bucket HEAD — read-only, zero fetch, zero write) and strictly BEFORE Step 1 (ingest_sec_filings). Every ingest_* pipeline, Voyage embedding call, and Postgres/pgvector + parquet write lives in Steps 1-9 — all unreachable once the guard exits. - infrastructure/spot_data_weekly.sh: new `--preflight-only` flag sets PREFLIGHT_ONLY=1, a MODIFIER orthogonal to RUN_MODE so it composes with the data path AND --rag-only. A dedicated data-path block runs `weekly_collector.py --morning-enrich --preflight-only` and/or `weekly_collector.py --phase 1 --preflight-only` (gated by the existing DO_MORNING_ENRICH/DO_PHASE1 split) then exit 0 before the real WORKLOADS heredoc — no prune (prune-audit JSON write), no RAG, no CloudWatch heartbeat, no S3 log upload. --rag-only --preflight-only behavior: runs ONLY the RAG-path preflight (boot + SSM secret fetch so rag.preflight's check_env_vars sees them + `run_weekly_ingestion.sh --preflight-only` = step-0-only + exit 0). No real RAG ingestion, no rag-ingestion heartbeat. `--preflight-only` alone runs ONLY the DataPhase1/MorningEnrich preflight. Universe-freshness tolerance note (ROADMAP owed-item #5): the Friday shell-run uses the phase1 / morning_enrich preflight modes. Per preflight.py::DataPreflight.run, NEITHER mode runs check_arcticdb_fresh — they only do _check_arcticdb_libraries_present (a presence read, not a freshness gate). morning_enrich deliberately omits freshness (it is part of what *makes* ArcticDB fresh); phase1 *populates* ArcticDB. The only freshness gate (check_arcticdb_fresh macro/SPY 4d) lives in the "daily" mode, which the Saturday/Friday data path never selects. So a Friday run predating Friday's settled polygon aggregate does NOT spuriously fail on a Thursday-last-bar — no --preflight-only-scoped tolerance code is required for the data path. Documented inline so a future mode-mapping change re-audits this invariant. Tests: new tests/test_preflight_only_dry_path.py (10 tests, static greps + AST-source assertions, matching the existing test_spot_data_weekly_run_modes.py / test_weekly_collector_preflight_ mode_mapping.py convention) pins: flag parsing on all 3 files, the exit-0-after-preflight-before-fetch/write ordering invariant, --rag-only --preflight-only step-0-only behavior, and the no-prune/no-RAG/no-heartbeat/no-S3-upload hard invariant. Full suite: 1229 passed, 1 skipped (pre-existing). bash -n clean on both shell scripts. No new deps, no secrets. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add EOD Step Function definition: post-market data + reconcile + inst…

d613c58

…ance stop

cipher813 merged commit 46aa5ca into main Apr 7, 2026
1 check passed

cipher813 deleted the feat/eod-step-function branch April 7, 2026 22:28

cipher813 mentioned this pull request May 2, 2026

fix(constituents): include tickers in collect() return dict #135

Merged

4 tasks

cipher813 mentioned this pull request May 18, 2026

feat(data): spot_data_weekly.sh --preflight-only (Friday shell-run dry path) #259

Merged

cipher813 mentioned this pull request May 18, 2026

feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip #260

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOD Step Function definition#5

EOD Step Function definition#5
cipher813 merged 1 commit into
mainfrom
feat/eod-step-function

cipher813 commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented Apr 7, 2026

Summary

Deployment

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant