Copy ssm_secrets.py into Phase 2 Lambda image (unblocks DataPhase2) by cipher813 · Pull Request #16 · cipher813/alpha-engine-data

cipher813 · 2026-04-11T01:52:54Z

Summary

Tonight's pipeline rerun passed DataPhase1, RAG, and Research cleanly (validating #13 macro-breadth fix + #15 Step Function HandleFailure fix end-to-end), but DataPhase2 failed with:

```
Runtime.ImportModuleError: No module named 'ssm_secrets'
```

Root cause

`lambda/handler.py:22` does `from ssm_secrets import load_secrets` at module top-level, but the Dockerfile never copied `ssm_secrets.py` into the image. The pre-#14 image was built manually at a time when this import didn't exist, so it ran fine for ages. #14's GitHub Actions auto-deploy rebuilt the image from the current Dockerfile, published version 2, and flipped the `live` alias before the canary step failed on an unrelated IAM permission — so the alias is now stuck on the broken v2.

Verified:
```
aws lambda get-alias --function-name alpha-engine-data-collector --name live
FunctionVersion: 2
LastModified: 2026-04-11T00:51:38Z (mid-#14-deploy)
```

Fix

Add `COPY ssm_secrets.py ${LAMBDA_TASK_ROOT}/` to the Dockerfile. Audited both `lambda/handler.py` and `collectors/alternative.py` import graphs via AST walk — the only first-party modules pulled in at handler import time are `collectors` (already copied) and `ssm_secrets` (now copied). `ssm_secrets.py` itself only imports stdlib (`logging`, `os`), so no transitive issues.

Also added `ssm_secrets.py` and `lambda/**` to the deploy workflow's `paths` filter — both were missing, which means changes to `handler.py` or the secrets loader would silently not retrigger a Lambda rebuild.

Test plan

`pytest tests/ -q` — 61 passed
Gitleaks pre-commit pass
AST audit of handler + alternative imports confirms `ssm_secrets` is the only missing first-party dep
After merge + auto-deploy, verify `aws lambda invoke alpha-engine-data-collector:live --payload '{"phase": 2, "dry_run": true}'` returns OK
Redrive the failed Step Function execution `manual-recovery-20260411T010521Z` from DataPhase2

Known follow-up (NOT in scope)

The GitHub Actions OIDC role `github-actions-lambda-deploy` lacks `lambda:InvokeFunction` + (implicitly) `lambda:UpdateAlias` on `alpha-engine-data-collector:live`. That's why `infrastructure/deploy.sh`'s canary step has failed after every auto-deploy since #14 landed, and the post-canary rollback path also silently fails — stranding the alias on whatever version just got published. The Lambda still gets updated because `UpdateFunctionCode` succeeds first; the canary is a post-update safety check. Two fixes possible:

Add `lambda:InvokeFunction` + `lambda:UpdateAlias` on the data-collector Lambda ARN to the OIDC role
Short-circuit the canary step when running in CI (`[ -n "$CI" ] && skip`)

Either way, that's a separate infrastructure PR. Tonight's manual canary can run from the local AWS creds after this PR merges.

🤖 Generated with Claude Code

Tonight's pipeline rerun passed DataPhase1 + RAG + Research cleanly (validating the #13 and #15 fixes) but DataPhase2 failed with: Runtime.ImportModuleError: No module named 'ssm_secrets' Root cause: lambda/handler.py:22 does `from ssm_secrets import load_secrets` at module top-level, but the Dockerfile never copied ssm_secrets.py into the image. The old pre-#14 image was apparently built manually at a time when this import didn't exist, so it ran fine. #14's GitHub Actions auto-deploy rebuilt the image from the current Dockerfile, published version 2, and flipped the `live` alias before the canary step failed on an unrelated IAM permission (see follow-up note below) — so the alias is now stuck on the broken v2. Fix: add `COPY ssm_secrets.py ${LAMBDA_TASK_ROOT}/` to the Dockerfile. Audited lambda/handler.py + collectors/alternative.py import graphs via AST walk: the only first-party modules pulled in at handler import time are `collectors` (already copied) and `ssm_secrets` (now copied). ssm_secrets.py itself only imports stdlib (logging, os), so it doesn't transitively pull anything else. Also added `ssm_secrets.py` and `lambda/**` to the deploy workflow's paths filter — both were missing, which means changes to handler.py or the secrets loader would silently not retrigger a Lambda rebuild. Known follow-up (NOT in scope for this PR): the GitHub Actions OIDC role `github-actions-lambda-deploy` lacks `lambda:InvokeFunction` + `lambda:UpdateAlias` on `alpha-engine-data-collector:live`. That's why infrastructure/deploy.sh's canary step fails after every successful build, and why the post-canary rollback also fails silently — leaving the alias stranded on whatever version just got published. Two options: (a) add those perms to the OIDC role, or (b) short-circuit the canary step in CI. Separate infrastructure PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes the canary IAM gap that has caused every auto-deploy since #12 to report red even when the Lambda update itself succeeded: AccessDeniedException: User: .../github-actions-lambda-deploy/... is not authorized to perform: lambda:InvokeFunction on resource: arn:aws:lambda:us-east-1:711398986525:function: alpha-engine-data-collector:live The github-actions-lambda-deploy OIDC role was created ad-hoc when the GitHub Actions auto-deploy workflow was introduced in #12. It had ECR push + Lambda UpdateFunctionCode/UpdateAlias, but not InvokeFunction — so infrastructure/deploy.sh's post-update canary step (aws lambda invoke with dry_run=true) failed with AccessDenied, the rollback then also failed silently because the script's || true swallowed the error, and every deploy since has been leaving the alias stranded on whatever version just got published with no safety net. Three deploys in a row (#13, #14, #16) all looked like failures despite the underlying Lambda being updated. This PR does two things: 1. Adds infrastructure/iam/ as the new home for version-controlled IAM policies. It's intentionally low-ceremony — flat JSON files, one per role, applied via a small idempotent shell script. No CloudFormation, no Terraform. For a 5-module infra-light project, a flat directory is the right amount of rigor. Migrate to CFN later if the blast radius grows. 2. Adds a new LambdaInvokeCanary statement to the existing deploy-role policy, granting lambda:InvokeFunction on all 5 alpha-engine Lambdas and their aliases/versions. Scoped narrowly to the same functions the role already has UpdateFunctionCode on, so the blast radius is unchanged: an attacker with ECR push + UpdateFunctionCode can already run arbitrary code in these Lambdas. Applied live via `infrastructure/iam/apply.sh github-actions-lambda-deploy` before committing — so the next deploy workflow run actually passes the canary step. Also cleaned up the orphaned old `deploy-lambdas` inline policy (the new file's name is `github-actions-lambda-deploy-policy`, matching the convention of filename == role name). Why this matters beyond tonight: every IAM change from here on is diffable, reviewable, and recoverable. If a future PR drops a permission, code review catches it at PR time instead of surfacing as a mysterious AccessDenied in production. Follow-up: the deploy.sh script's rollback-on-canary-failure logic still uses `|| true` to swallow errors silently, which is why the stranded alias never got rolled back. That's a separate PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…143) Stage 3 of the RAG cleanup arc. Lib v0.3.0 (alpha-engine-lib PR #16, merged + tagged) introduced alpha_engine_lib.rag with the consolidated db/embeddings/retrieval/schema. Stage 2 (alpha-engine- research PR #97) migrated research's side. This PR migrates data's. Changes: Pipeline imports updated (5 files): - rag/pipelines/{ingest_theses,ingest_8k_filings,ingest_sec_filings, ingest_earnings_finnhub,filing_change_detection}.py — `from rag.{embeddings,retrieval,db}` → `from alpha_engine_lib.rag.{...}` Deleted (now redundant — code lives in lib): - rag/db.py (was the canonical drift source — its register_vector fix was the basis for the lib version) - rag/embeddings.py - rag/retrieval.py - rag/schema.sql Kept: - rag/__init__.py — minimal namespace placeholder - rag/pipelines/ — ingestion pipelines stay here (data's versions inline the signals lookup; canonical location decision deferred to a later cleanup arc pending production SF state verification) - rag/preflight.py — already uses alpha_engine_lib.logging; no changes needed Lib pin bumped: - requirements.txt: alpha-engine-lib v0.2.4 → v0.3.0 - Extras: added [rag] alongside [arcticdb,flow_doctor] - Note flagged on the direct pgvector/psycopg2-binary pins above the lib line — these become redundant once the [rag] extra soaks; will drop in a follow-up PR rather than ripping them out today Verified: - All 433 data tests pass - alpha_engine_lib.rag imports resolve (retrieve, is_available, embed_texts) - rag.preflight imports cleanly Companion: - alpha-engine-lib PR #16 (merged + tagged v0.3.0) - alpha-engine-research PR #97 (Stage 2 — research migration) - Stage 4 (deferred): pipelines canonical-location decision Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit 32c6285 into main Apr 11, 2026
1 check passed

cipher813 deleted the fix/dockerfile-copy-ssm-secrets branch April 11, 2026 01:54

cipher813 mentioned this pull request Apr 11, 2026

Version-control IAM policies; add lambda:InvokeFunction to deploy role #17

Merged

6 tasks

cipher813 mentioned this pull request May 3, 2026

refactor(rag): migrate shared RAG code to alpha-engine-lib (v0.3.0) #143

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy ssm_secrets.py into Phase 2 Lambda image (unblocks DataPhase2)#16

Copy ssm_secrets.py into Phase 2 Lambda image (unblocks DataPhase2)#16
cipher813 merged 1 commit into
mainfrom
fix/dockerfile-copy-ssm-secrets

cipher813 commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented Apr 11, 2026

Summary

Root cause

Fix

Test plan

Known follow-up (NOT in scope)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant