Add GitHub Actions auto-deploy on push to main by cipher813 · Pull Request #12 · cipher813/alpha-engine-data

cipher813 · 2026-04-10T21:54:21Z

Automates the Phase 2 Lambda deploy via GitHub Actions OIDC. Every merge to main that touches collectors/\*\*, polygon_client.py, Dockerfile, or requirements* triggers a build + ECR push + Lambda update for alpha-engine-data-collector.

Note: DataPhase1 runs as EC2 SSM on the micro instance (not Lambda), so this workflow does not help with Phase 1 drift. Phase 1 still requires a manual git pull on the micro after changes to Phase1-only code. Follow-up task TBD.

Uses the same github-actions-lambda-deploy IAM role as alpha-engine-research PR #9 and alpha-engine-predictor PR #11.

Fallback: existing bash infrastructure/deploy.sh still works manually.

Automates the Phase 2 Lambda deploy (alpha-engine-data-collector) via GitHub Actions OIDC. Every merge to main that touches the collectors code, polygon client, Dockerfile, or requirements triggers a build + ECR push + Lambda update. Note: DataPhase1 runs as EC2 SSM (not Lambda) on the micro instance so this workflow does not help with Phase 1 drift. Phase 1 still requires a manual git pull on the micro instance after any collectors/prices.py or collectors/constituents.py change. Separate follow-up: add a scheduled pull job or EC2 user-data script to automate that too. Mechanism: same as alpha-engine-research + alpha-engine-predictor workflows. Uses the pre-existing github-actions-lambda-deploy IAM role scoped to this repo + research + predictor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Phase 2 Lambda deploy workflow has been failing on every push to main since #12 merged yesterday: ERROR: failed to calculate checksum of ref ... "/config.yaml": not found Root cause: the Dockerfile copies config.yaml into the image, but config.yaml is gitignored per the repo's security policy (it contains bucket names / prefixes we keep out of the public repo), so there is literally nothing to copy during the GitHub Actions checkout. The Lambda handler (lambda/handler.py) already falls back to a hardcoded default when config.yaml is absent: config = { "bucket": "alpha-engine-research", "market_data": {"s3_prefix": "market_data/"}, } So the COPY was dead weight AND a build breakage. Dropped the line and left an inline comment explaining why it must not come back. Also copied store/ into the image — it was added in #13 as a shared home for the S3 parquet loader and collectors/macro imports it at module top level. The Lambda handler doesn't import macro directly (Phase 2 only calls alternative.collect), but weekly_collector.py is in the image and does `from collectors import macro` at the top, so any future handler refactor that touches weekly_collector would bomb on `ModuleNotFoundError: store` without this. Added 'store/**' to the deploy.yml paths filter so a future change to the shared loader actually retriggers the deploy workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes the canary IAM gap that has caused every auto-deploy since #12 to report red even when the Lambda update itself succeeded: AccessDeniedException: User: .../github-actions-lambda-deploy/... is not authorized to perform: lambda:InvokeFunction on resource: arn:aws:lambda:us-east-1:711398986525:function: alpha-engine-data-collector:live The github-actions-lambda-deploy OIDC role was created ad-hoc when the GitHub Actions auto-deploy workflow was introduced in #12. It had ECR push + Lambda UpdateFunctionCode/UpdateAlias, but not InvokeFunction — so infrastructure/deploy.sh's post-update canary step (aws lambda invoke with dry_run=true) failed with AccessDenied, the rollback then also failed silently because the script's || true swallowed the error, and every deploy since has been leaving the alias stranded on whatever version just got published with no safety net. Three deploys in a row (#13, #14, #16) all looked like failures despite the underlying Lambda being updated. This PR does two things: 1. Adds infrastructure/iam/ as the new home for version-controlled IAM policies. It's intentionally low-ceremony — flat JSON files, one per role, applied via a small idempotent shell script. No CloudFormation, no Terraform. For a 5-module infra-light project, a flat directory is the right amount of rigor. Migrate to CFN later if the blast radius grows. 2. Adds a new LambdaInvokeCanary statement to the existing deploy-role policy, granting lambda:InvokeFunction on all 5 alpha-engine Lambdas and their aliases/versions. Scoped narrowly to the same functions the role already has UpdateFunctionCode on, so the blast radius is unchanged: an attacker with ECR push + UpdateFunctionCode can already run arbitrary code in these Lambdas. Applied live via `infrastructure/iam/apply.sh github-actions-lambda-deploy` before committing — so the next deploy workflow run actually passes the canary step. Also cleaned up the orphaned old `deploy-lambdas` inline policy (the new file's name is `github-actions-lambda-deploy-policy`, matching the convention of filename == role name). Why this matters beyond tonight: every IAM change from here on is diffable, reviewable, and recoverable. If a future PR drops a permission, code review catches it at PR time instead of surfacing as a mysterious AccessDenied in production. Follow-up: the deploy.sh script's rollback-on-canary-failure logic still uses `|| true` to swallow errors silently, which is why the stranded alias never got rolled back. That's a separate PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e seed + backfill (#206) * feat(signal_returns): write calibrator-v1 context on score_performance seed + backfill Root-cause closure for the 2026-05-09 Saturday SF evaluator P0 (weight_optimizer ERROR: "None of [Index(['quant_score','qual_score'])] are in the [columns]"; auto-rollback Sharpe -42.2% vs baseline). Producer audit revealed two parallel writers diverged silently after research migration #12 (2026-05-08): * scoring/performance_tracker.py::record_new_buy_scores writes ALL 5 canonical context columns — but has zero production callers. * collectors/signal_returns.py::_seed_score_performance is the actual production writer (runs weekly in DataPhase1) and only wrote (symbol, score_date, score, price_on_date). The 5 canonical columns (quant_score, qual_score, conviction, sector_modifier, market_regime) were never populated. Single-fact-single-writer rebuild: * _seed_score_performance now extracts the 5 context fields from the same signals.json payload that drives the BUY filter — single source-of-truth fetch per signals.json, no second round-trip. * New _backfill_score_context repairs legacy rows whose canonical columns are NULL. UPDATE-WHERE-NULL so re-runs are no-ops once every row has a source. * _ensure_score_performance_schema mirrors research migration #12 defensively in case DataPhase1 ever fires against a fresh research.db before research's cold-start migrations run. Composes with backtester #176 (PR-day consumer-side coalesce fix). With this PR the producer becomes authoritative; the next backtester PR can retire the S3 round-trip in weight_optimizer.load_with_subscores. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(signal_returns): drift gate — canonical context coverage CW gauge Locks the producer-side contract established in the previous commit: after seed + backfill complete, query score_performance for rows with score_date >= 2026-05-17 (first Sat SF after this PR merges) and emit the coverage percentage as a CloudWatch gauge: AlphaEngine/Data/score_performance_canonical_coverage_pct Coverage = fraction of post-cutover rows with ALL 5 canonical context columns populated (quant_score, qual_score, conviction, sector_modifier, market_regime). 100% is the contract; the gauge is always emitted (including 100.0) so alarm baselines stay continuous. Mirrors the chronic-gap drift detection pattern at weekly_collector.py:_check_chronic_gap_polygon_recovery — same best-effort emit, same observability-not-load-bearing posture. A follow-up alpha-engine-lib transparency_inventory entry can wire this into the substrate health alarm if desired; the metric itself is the drift signal. Tripwire test asserts _CANONICAL_CONTEXT_COLUMNS stays in lockstep with the seed INSERT — adding a 6th column to one without the other would make the drift gate blind to that field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit 941a2d5 into main Apr 10, 2026
1 check passed

cipher813 deleted the feat/github-actions-deploy branch April 10, 2026 22:01

cipher813 mentioned this pull request Apr 11, 2026

Fix Phase 2 Lambda deploy: drop dead COPY config.yaml, copy store/ #14

Merged

4 tasks

cipher813 mentioned this pull request Apr 11, 2026

Version-control IAM policies; add lambda:InvokeFunction to deploy role #17

Merged

6 tasks

cipher813 mentioned this pull request May 10, 2026

feat(signal_returns): write calibrator-v1 context on score_performance seed + backfill #206

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GitHub Actions auto-deploy on push to main#12

Add GitHub Actions auto-deploy on push to main#12
cipher813 merged 1 commit into
mainfrom
feat/github-actions-deploy

cipher813 commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant