feat: VWAP data plumbing + SSM secrets + push scripts#2
Merged
Conversation
…cripts VWAP: polygon_client now extracts vw field from grouped-daily, daily_closes includes VWAP column in parquet. Enables executor VWAP discount entry trigger. SSM: all modules load secrets from AWS SSM Parameter Store (/alpha-engine/*) at startup via ssm_secrets.py, eliminating need to push .env to each target. Scripts: push-secrets.sh (Lambda+EC2), push-configs.sh (config files to EC2), seed-ssm.sh (migrate .env to SSM), add-ssm-policy.sh (IAM permissions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 tasks
cipher813
added a commit
that referenced
this pull request
Apr 28, 2026
* perf(migrate-vwap): threadpool the per-symbol fan-out
Sequential migration hit SSM's 1-hour timeout at 60% complete (542/904
symbols). Each symbol is read → reorder columns → write back, all S3
round-trips, all GIL-released — perfect fit for the thread-pool fan-out
pattern daily_append already uses for its Phase 2 writes.
- One ThreadPoolExecutor across every target symbol
- Worker count env-overridable via MIGRATE_UNIVERSE_VWAP_WORKERS (default 16),
same shape as DAILY_APPEND_WRITE_WORKERS — prod can tune without redeploy
- Per-symbol outcome dict captures read/write errors instead of raising,
so one bad symbol can't abort the batch
- Aggregation runs on the main thread (counter mutation stays single-
threaded; no locks needed)
- Summary includes elapsed_seconds + workers so SSM-timeout-vs-finish
postmortems can see actual runtime
Tests:
- test_migration_uses_threadpool_executor — source-text invariant
- test_migration_workers_env_overridable — env-var override invariant
- test_migration_threaded_all_writes_succeed — N=20 functional check that
every result lands in the right bucket (catches generator-double-iter
regressions)
- test_migration_threaded_summary_includes_elapsed_and_workers — ops field
- test_migration_threaded_aggregates_mixed_outcomes — mixed-outcome run
(canonical + needs-fix + read-fail + write-fail) all aggregate correctly
- All 280 existing tests still pass
Expected runtime on 904-symbol universe: ~10-15 min at 16 workers (down
from sequential ~120 min). The migration is idempotent — running it
against a partially-migrated universe (already-canonical symbols skipped)
just resumes from where the prior run died.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(saturday-sf): add morning_enrich step + reschedule cron 00→09 UTC Sat
Two coupled changes that prepare the Saturday Step Function for the
post-Tier-4 re-enable.
(1) Add `--morning-enrich` step to `infrastructure/spot_data_weekly.sh`
BEFORE Phase 1 + builders.prune_delisted_tickers.
Polygon's grouped-daily aggregate for date T isn't fully settled
until calendar day T+1. The Friday weekday-SF run (Friday ~13:05 PT
via systemd timer + the weekday-SF MorningEnrich Lambda) collects
daily_closes pre-settlement, so Friday's row in S3 + ArcticDB may
carry stale / partial polygon data.
By the time the Saturday SF kicks off (09:00 UTC Sat — see #2),
polygon's Friday data IS settled. This step calls
`weekly_collector.py --morning-enrich` (same code path the weekday
SF MorningEnrich Lambda uses, exists since alpha-engine-data#91)
to refetch Friday's daily_closes via polygon and re-append to
ArcticDB so all downstream Saturday work (Phase 1 prices, RAG,
predictor training, backtester) reads polygon-authoritative
Friday closes.
Hard-fail on morning_enrich failure: the bundle aborts so RAG +
Phase 1 don't run on stale upstream data. Matches the no-silent-
fails posture for unstable system state.
(2) Reschedule EventBridge rule cron from `(0 0 ? * SAT *)` to
`(0 9 ? * SAT *)`.
Old: Sat 00:00 UTC = Fri ~5pm PT. Polygon's Friday data NOT yet
settled — morning_enrich step (above) would refetch stale / not-
yet-settled data, defeating its purpose.
New: Sat 09:00 UTC = 02:00 AM PT Sat (PDT) / 01:00 AM PT (PST).
Polygon T+1 settle complete by 09:00 UTC; morning_enrich pulls
authoritative Friday closes; downstream work reads correct data.
Updated in:
- infrastructure/deploy_step_function.sh:216 (put-rule schedule
+ description) + line 277 (echo'd summary)
- infrastructure/cloudformation/alpha-engine-orchestration.yaml:111
(CFN template ScheduleExpression + Description)
The live AWS rule (currently DISABLED at cron(0 0 ? * SAT *)) still
needs an `aws events put-rule` to apply this change + an
`enable-rule` to start firing. CLI-side step ordering: this PR
captures IaC intent; live rule update happens after merge.
Tests: 280 unit tests pass (no test surface for the spot script
itself; bash -n syntax-check clean). morning_enrich-related tests
(test_weekly_collector_morning_enrich.py, test_daily_closes_source_modes.py)
verify the underlying CLI semantics this step depends on.
Companion changes (separate repos):
- alpha-engine-backtester: flip use_vectorized_sweep default-on
(PR #123-ish, this same session)
- alpha-engine-docs: SYSTEM_STATE.md Tier 4 deploy entry + Sat-fill
addition + new cron rationale
- CLI ops: aws events put-rule --schedule-expression "cron(0 9 ? * SAT *)"
+ aws events enable-rule (final step that actually starts firing)
Closes ROADMAP P0 "Re-enable Saturday SF EventBridge after Tier 4
lands" (added 2026-04-27, 5-PR Tier 4 deployment arc closes here).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
4 tasks
cipher813
added a commit
that referenced
this pull request
May 25, 2026
…hase 0.2) (#308) * feat(cost-telemetry): wire news event extraction + lib v0.32→v0.33 (Phase 0.2) Closes the largest previously-untracked LLM cost slice in the system (~$20–60/mo, the dominant ~20–60% of monthly Anthropic spend per the Phase 0 audit at ``alpha-engine-docs/private/prompt-caching-investigation-260525.md`` §1.1). News-article event extraction in ``collectors/nlp/event_extraction.py:167`` fires 100–300 Haiku calls per RAGIngestion run; this PR records every response's tokens + server-tool fees into a per-run JSONL flushed once to S3 at the canonical research-side cost-raw partition. **Zero-coupling pattern:** new ``rag/pipelines/_cost_telemetry.py`` provides ``wrap_client_for_cost_telemetry(client, buffer)`` — proxy that records every ``messages.create()`` response into a buffer without changing the response shape returned. ``AnthropicEventExtractor`` is UNCHANGED — telemetry composes at the client-construction layer in ``_run_nlp`` rather than polluting the extractor with cost-tracking concerns. **Single S3 chokepoint:** rows land at ``s3://alpha-engine-research/decision_artifacts/_cost_raw/{date}/{date}/data:news_event_extraction.jsonl`` — same partition the research-side ``aggregate_costs.py`` already scans. The daily parquet now sums data's rows alongside research's under one ``by_agent_id`` breakdown; the dashboard cost panel (Phase 0.3) will show every site in one view. **Buffered + flushed once** rather than per-call to keep S3 PutObject volume sane: 100–300 calls per run → 1 PutObject at end-of-pipeline. **Fail-loud at flush** per ``[[feedback_no_silent_fails]]``: S3 PutObject failure raises ``CostBufferFlushError`` and fails the pipeline — silent miss on the dominant cost slice would defeat the Phase 0 visibility goal. Per-call recording failures (malformed response shape) are logged but do NOT propagate — the event extractor's primary deliverable must survive a cost-telemetry hiccup. **Lib pin v0.32.0 → v0.33.0** in requirements.txt + Dockerfile to consume the lifted ``alpha_engine_lib.cost.record_anthropic_call`` (alpha-engine-lib #69, the SOTA chokepoint that data + executor are consumers #2 + #3 of after morning-signal originated the pattern). **Tests:** new ``tests/test_news_cost_telemetry.py`` covers buffer record/flush/empty/error paths + proxy passthrough + per-call recording failure isolation + factory naming convention. 9 new tests, suite 1480 → 1489 passing, zero regressions. Closes ROADMAP Phase 0.2 row "News event extraction (data, $20–60/mo, dominant slice)" — first production exercise will land on the next Saturday SF (2026-05-30) when RAGIngestion fires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(news-cost-telemetry): drop moto dep, use in-memory S3 mock CI failure on PR #308: ``ModuleNotFoundError: No module named 'moto'`` — this repo's CI installs only ``requirements.txt + pytest`` (no dev-extras file), and the rest of the suite avoids moto by using a minimal in-memory S3 mock (``_InMemoryS3`` in ``tests/test_news_aggregates.py``). Mirror that pattern here so the cost-telemetry tests run cleanly in the unmodified CI environment. Functional coverage unchanged — all 9 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 25, 2026
…lusion fix - WAVE4_SLIM_DELETION_RUNBOOK.md Gate #2 rewritten: the original "≥1 clean Saturday-SF parity observation" was made unreachable by the by-design adjustment-policy mismatch (slim raw close vs ArcticDB auto-adjusted close), and the `passed` flag in alpha_engine_lib/reconcile.py:62-68 requires `n_cells_over_epsilon == 0` which dividend-scale deltas cannot satisfy at any reasonable epsilon (even data #305's 1e-2). The actual safety case is 6 weeks of consumer-side ArcticDB-primary production soak since the 2026-04-14 cutover with no correctness alerts. Backup-prefix date bumped 260523 → 260525. - tests/test_wave4_slim_arctic_parity.py: `_production_py_files()` was rglob'ing the entire repo including .venv/, build/, dist/, .git/. Hit on .venv joblib test fixture (big5-encoded) → UnicodeDecodeError before any actual production-code scan could run. Added _EXCLUDED_PREFIXES tuple. Suite 1484 passing post-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 25, 2026
…ks [merge-gated 5/23] (#269) * chore(wave4): delete slim writer + load_slim_cache API + consumer fallbacks PR4 (data) of the predictor/price_cache_slim deletion arc — TERMINAL state. Staged as DRAFT: the slim writer must keep running through the 5/23 Saturday-SF so the WAVE4_PARITY_METRIC streams emit; merge is gated on that parity read (see WAVE4_SLIM_DELETION_RUNBOOK.md). Code removed: - collectors/slim_cache.py (the 2y-slice writer) — deleted. - weekly_collector.py — slim_cache import + the '# 3. Slim cache' collect block + the --only 'slim' choice + docstrings. - store/parquet_loader.py — load_slim_cache + SLIM_CACHE_PREFIX + now-unused ThreadPoolExecutor import. load_parquet_from_s3 KEPT (re-exported via features.compute._load_parquet_from_s3, consumed by builders/backfill.py — caught by the suite, not the within-file grep). - collectors/macro._load_breadth_prices — ArcticDB-only (slim fallback + parity dual-read removed); preserves the no-null breadth contract (None -> key omitted) on ArcticDB failure, matching pre-Wave-4 single-source-unavailable behaviour. - features/compute._load_price_source — ArcticDB-only (universe + macro libs); slim fallback + parity removed; no-data contract preserved. Tests reworked to the ArcticDB-only terminal state; the PR0b parity-harness/consumer-lock file is repurposed into a permanent regression guard (slim functional surface must never return — lib's own test_reconcile/test_arcticdb still cover the substrate). Full data suite 1375 passing. Added WAVE4_SLIM_DELETION_RUNBOOK.md — the gated, manual S3 prefix deletion (byte-equal backup -> aws s3 rm, Wave-5 precedent). NOT executed by CI or this PR. DRAFT until 2026-05-23 Saturday-SF parity confirms slim<->ArcticDB equivalence across breadth/compute/exit_timing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(runbook) + test(slim-guard): consumer-side soak gate + .venv exclusion fix - WAVE4_SLIM_DELETION_RUNBOOK.md Gate #2 rewritten: the original "≥1 clean Saturday-SF parity observation" was made unreachable by the by-design adjustment-policy mismatch (slim raw close vs ArcticDB auto-adjusted close), and the `passed` flag in alpha_engine_lib/reconcile.py:62-68 requires `n_cells_over_epsilon == 0` which dividend-scale deltas cannot satisfy at any reasonable epsilon (even data #305's 1e-2). The actual safety case is 6 weeks of consumer-side ArcticDB-primary production soak since the 2026-04-14 cutover with no correctness alerts. Backup-prefix date bumped 260523 → 260525. - tests/test_wave4_slim_arctic_parity.py: `_production_py_files()` was rglob'ing the entire repo including .venv/, build/, dist/, .git/. Hit on .venv joblib test fixture (big5-encoded) → UnicodeDecodeError before any actual production-code scan could run. Added _EXCLUDED_PREFIXES tuple. Suite 1484 passing post-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
polygon_client.pynow extractsvw(VWAP) field from grouped-daily response;daily_closes.pyincludes VWAP column in parquet — enables executor's VWAP discount entry triggerssm_secrets.pyloads all/alpha-engine/*params from AWS SSM Parameter Store at startup, replacing manual .env push workflow. Wired into Lambda handler + weekly_collectorpush-secrets.sh(all Lambdas + EC2),push-configs.sh(config files to EC2),seed-ssm.sh(migrate .env → SSM),add-ssm-policy.sh(IAM permissions)Cross-repo dependencies
Test plan
bash infrastructure/seed-ssm.sh --dry-runlists all secretsbash infrastructure/push-configs.sh --dry-runlists config files🤖 Generated with Claude Code