Skip to content

feat: VWAP data plumbing + SSM secrets + push scripts#2

Merged
cipher813 merged 1 commit into
mainfrom
feat/vwap-ssm-secrets
Apr 6, 2026
Merged

feat: VWAP data plumbing + SSM secrets + push scripts#2
cipher813 merged 1 commit into
mainfrom
feat/vwap-ssm-secrets

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • VWAP: polygon_client.py now extracts vw (VWAP) field from grouped-daily response; daily_closes.py includes VWAP column in parquet — enables executor's VWAP discount entry trigger
  • SSM secrets: ssm_secrets.py loads all /alpha-engine/* params from AWS SSM Parameter Store at startup, replacing manual .env push workflow. Wired into Lambda handler + weekly_collector
  • Push scripts: push-secrets.sh (all Lambdas + EC2), push-configs.sh (config files to EC2), seed-ssm.sh (migrate .env → SSM), add-ssm-policy.sh (IAM permissions)

Cross-repo dependencies

  • cipher813/alpha-engine — reads VWAP from daily_closes parquet, uses ssm_secrets.py
  • cipher813/alpha-engine-research — uses ssm_secrets.py
  • cipher813/alpha-engine-predictor — uses ssm_secrets.py

Test plan

  • 56 tests pass locally
  • Verify VWAP column appears in next daily_closes parquet (after Saturday pipeline)
  • Verify bash infrastructure/seed-ssm.sh --dry-run lists all secrets
  • Verify bash infrastructure/push-configs.sh --dry-run lists config files

🤖 Generated with Claude Code

…cripts

VWAP: polygon_client now extracts vw field from grouped-daily, daily_closes
includes VWAP column in parquet. Enables executor VWAP discount entry trigger.

SSM: all modules load secrets from AWS SSM Parameter Store (/alpha-engine/*)
at startup via ssm_secrets.py, eliminating need to push .env to each target.

Scripts: push-secrets.sh (Lambda+EC2), push-configs.sh (config files to EC2),
seed-ssm.sh (migrate .env to SSM), add-ssm-policy.sh (IAM permissions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit aa2b188 into main Apr 6, 2026
1 check passed
@cipher813 cipher813 deleted the feat/vwap-ssm-secrets branch April 6, 2026 21:05
@cipher813 cipher813 restored the feat/vwap-ssm-secrets branch April 7, 2026 13:57
cipher813 added a commit that referenced this pull request Apr 28, 2026
* perf(migrate-vwap): threadpool the per-symbol fan-out

Sequential migration hit SSM's 1-hour timeout at 60% complete (542/904
symbols). Each symbol is read → reorder columns → write back, all S3
round-trips, all GIL-released — perfect fit for the thread-pool fan-out
pattern daily_append already uses for its Phase 2 writes.

- One ThreadPoolExecutor across every target symbol
- Worker count env-overridable via MIGRATE_UNIVERSE_VWAP_WORKERS (default 16),
  same shape as DAILY_APPEND_WRITE_WORKERS — prod can tune without redeploy
- Per-symbol outcome dict captures read/write errors instead of raising,
  so one bad symbol can't abort the batch
- Aggregation runs on the main thread (counter mutation stays single-
  threaded; no locks needed)
- Summary includes elapsed_seconds + workers so SSM-timeout-vs-finish
  postmortems can see actual runtime

Tests:
- test_migration_uses_threadpool_executor — source-text invariant
- test_migration_workers_env_overridable — env-var override invariant
- test_migration_threaded_all_writes_succeed — N=20 functional check that
  every result lands in the right bucket (catches generator-double-iter
  regressions)
- test_migration_threaded_summary_includes_elapsed_and_workers — ops field
- test_migration_threaded_aggregates_mixed_outcomes — mixed-outcome run
  (canonical + needs-fix + read-fail + write-fail) all aggregate correctly
- All 280 existing tests still pass

Expected runtime on 904-symbol universe: ~10-15 min at 16 workers (down
from sequential ~120 min). The migration is idempotent — running it
against a partially-migrated universe (already-canonical symbols skipped)
just resumes from where the prior run died.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(saturday-sf): add morning_enrich step + reschedule cron 00→09 UTC Sat

Two coupled changes that prepare the Saturday Step Function for the
post-Tier-4 re-enable.

(1) Add `--morning-enrich` step to `infrastructure/spot_data_weekly.sh`
    BEFORE Phase 1 + builders.prune_delisted_tickers.

    Polygon's grouped-daily aggregate for date T isn't fully settled
    until calendar day T+1. The Friday weekday-SF run (Friday ~13:05 PT
    via systemd timer + the weekday-SF MorningEnrich Lambda) collects
    daily_closes pre-settlement, so Friday's row in S3 + ArcticDB may
    carry stale / partial polygon data.

    By the time the Saturday SF kicks off (09:00 UTC Sat — see #2),
    polygon's Friday data IS settled. This step calls
    `weekly_collector.py --morning-enrich` (same code path the weekday
    SF MorningEnrich Lambda uses, exists since alpha-engine-data#91)
    to refetch Friday's daily_closes via polygon and re-append to
    ArcticDB so all downstream Saturday work (Phase 1 prices, RAG,
    predictor training, backtester) reads polygon-authoritative
    Friday closes.

    Hard-fail on morning_enrich failure: the bundle aborts so RAG +
    Phase 1 don't run on stale upstream data. Matches the no-silent-
    fails posture for unstable system state.

(2) Reschedule EventBridge rule cron from `(0 0 ? * SAT *)` to
    `(0 9 ? * SAT *)`.

    Old: Sat 00:00 UTC = Fri ~5pm PT. Polygon's Friday data NOT yet
    settled — morning_enrich step (above) would refetch stale / not-
    yet-settled data, defeating its purpose.

    New: Sat 09:00 UTC = 02:00 AM PT Sat (PDT) / 01:00 AM PT (PST).
    Polygon T+1 settle complete by 09:00 UTC; morning_enrich pulls
    authoritative Friday closes; downstream work reads correct data.

    Updated in:
      - infrastructure/deploy_step_function.sh:216 (put-rule schedule
        + description) + line 277 (echo'd summary)
      - infrastructure/cloudformation/alpha-engine-orchestration.yaml:111
        (CFN template ScheduleExpression + Description)

    The live AWS rule (currently DISABLED at cron(0 0 ? * SAT *)) still
    needs an `aws events put-rule` to apply this change + an
    `enable-rule` to start firing. CLI-side step ordering: this PR
    captures IaC intent; live rule update happens after merge.

Tests: 280 unit tests pass (no test surface for the spot script
itself; bash -n syntax-check clean). morning_enrich-related tests
(test_weekly_collector_morning_enrich.py, test_daily_closes_source_modes.py)
verify the underlying CLI semantics this step depends on.

Companion changes (separate repos):
  - alpha-engine-backtester: flip use_vectorized_sweep default-on
    (PR #123-ish, this same session)
  - alpha-engine-docs: SYSTEM_STATE.md Tier 4 deploy entry + Sat-fill
    addition + new cron rationale
  - CLI ops: aws events put-rule --schedule-expression "cron(0 9 ? * SAT *)"
    + aws events enable-rule (final step that actually starts firing)

Closes ROADMAP P0 "Re-enable Saturday SF EventBridge after Tier 4
lands" (added 2026-04-27, 5-PR Tier 4 deployment arc closes here).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 deleted the feat/vwap-ssm-secrets branch May 18, 2026 15:33
cipher813 added a commit that referenced this pull request May 25, 2026
…hase 0.2) (#308)

* feat(cost-telemetry): wire news event extraction + lib v0.32→v0.33 (Phase 0.2)

Closes the largest previously-untracked LLM cost slice in the system
(~$20–60/mo, the dominant ~20–60% of monthly Anthropic spend per the
Phase 0 audit at ``alpha-engine-docs/private/prompt-caching-investigation-260525.md``
§1.1). News-article event extraction in
``collectors/nlp/event_extraction.py:167`` fires 100–300 Haiku calls
per RAGIngestion run; this PR records every response's tokens +
server-tool fees into a per-run JSONL flushed once to S3 at the
canonical research-side cost-raw partition.

**Zero-coupling pattern:** new ``rag/pipelines/_cost_telemetry.py``
provides ``wrap_client_for_cost_telemetry(client, buffer)`` — proxy
that records every ``messages.create()`` response into a buffer
without changing the response shape returned. ``AnthropicEventExtractor``
is UNCHANGED — telemetry composes at the client-construction layer in
``_run_nlp`` rather than polluting the extractor with cost-tracking
concerns.

**Single S3 chokepoint:** rows land at
``s3://alpha-engine-research/decision_artifacts/_cost_raw/{date}/{date}/data:news_event_extraction.jsonl``
— same partition the research-side ``aggregate_costs.py`` already
scans. The daily parquet now sums data's rows alongside research's
under one ``by_agent_id`` breakdown; the dashboard cost panel
(Phase 0.3) will show every site in one view.

**Buffered + flushed once** rather than per-call to keep S3 PutObject
volume sane: 100–300 calls per run → 1 PutObject at end-of-pipeline.

**Fail-loud at flush** per ``[[feedback_no_silent_fails]]``: S3
PutObject failure raises ``CostBufferFlushError`` and fails the
pipeline — silent miss on the dominant cost slice would defeat the
Phase 0 visibility goal. Per-call recording failures (malformed
response shape) are logged but do NOT propagate — the event extractor's
primary deliverable must survive a cost-telemetry hiccup.

**Lib pin v0.32.0 → v0.33.0** in requirements.txt + Dockerfile to
consume the lifted ``alpha_engine_lib.cost.record_anthropic_call``
(alpha-engine-lib #69, the SOTA chokepoint that data + executor are
consumers #2 + #3 of after morning-signal originated the pattern).

**Tests:** new ``tests/test_news_cost_telemetry.py`` covers buffer
record/flush/empty/error paths + proxy passthrough + per-call
recording failure isolation + factory naming convention. 9 new tests,
suite 1480 → 1489 passing, zero regressions.

Closes ROADMAP Phase 0.2 row "News event extraction (data, $20–60/mo,
dominant slice)" — first production exercise will land on the next
Saturday SF (2026-05-30) when RAGIngestion fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(news-cost-telemetry): drop moto dep, use in-memory S3 mock

CI failure on PR #308: ``ModuleNotFoundError: No module named 'moto'``
— this repo's CI installs only ``requirements.txt + pytest`` (no
dev-extras file), and the rest of the suite avoids moto by using a
minimal in-memory S3 mock (``_InMemoryS3`` in
``tests/test_news_aggregates.py``). Mirror that pattern here so the
cost-telemetry tests run cleanly in the unmodified CI environment.

Functional coverage unchanged — all 9 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 25, 2026
…lusion fix

- WAVE4_SLIM_DELETION_RUNBOOK.md Gate #2 rewritten: the original
  "≥1 clean Saturday-SF parity observation" was made unreachable by the
  by-design adjustment-policy mismatch (slim raw close vs ArcticDB
  auto-adjusted close), and the `passed` flag in
  alpha_engine_lib/reconcile.py:62-68 requires `n_cells_over_epsilon == 0`
  which dividend-scale deltas cannot satisfy at any reasonable epsilon
  (even data #305's 1e-2). The actual safety case is 6 weeks of
  consumer-side ArcticDB-primary production soak since the 2026-04-14
  cutover with no correctness alerts. Backup-prefix date bumped
  260523 → 260525.

- tests/test_wave4_slim_arctic_parity.py: `_production_py_files()` was
  rglob'ing the entire repo including .venv/, build/, dist/, .git/.
  Hit on .venv joblib test fixture (big5-encoded) →
  UnicodeDecodeError before any actual production-code scan could run.
  Added _EXCLUDED_PREFIXES tuple. Suite 1484 passing post-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 25, 2026
…ks [merge-gated 5/23] (#269)

* chore(wave4): delete slim writer + load_slim_cache API + consumer fallbacks

PR4 (data) of the predictor/price_cache_slim deletion arc — TERMINAL
state. Staged as DRAFT: the slim writer must keep running through the
5/23 Saturday-SF so the WAVE4_PARITY_METRIC streams emit; merge is
gated on that parity read (see WAVE4_SLIM_DELETION_RUNBOOK.md).

Code removed:
- collectors/slim_cache.py (the 2y-slice writer) — deleted.
- weekly_collector.py — slim_cache import + the '# 3. Slim cache'
  collect block + the --only 'slim' choice + docstrings.
- store/parquet_loader.py — load_slim_cache + SLIM_CACHE_PREFIX +
  now-unused ThreadPoolExecutor import. load_parquet_from_s3 KEPT
  (re-exported via features.compute._load_parquet_from_s3, consumed by
  builders/backfill.py — caught by the suite, not the within-file grep).
- collectors/macro._load_breadth_prices — ArcticDB-only (slim fallback
  + parity dual-read removed); preserves the no-null breadth contract
  (None -> key omitted) on ArcticDB failure, matching pre-Wave-4
  single-source-unavailable behaviour.
- features/compute._load_price_source — ArcticDB-only (universe +
  macro libs); slim fallback + parity removed; no-data contract
  preserved.

Tests reworked to the ArcticDB-only terminal state; the PR0b
parity-harness/consumer-lock file is repurposed into a permanent
regression guard (slim functional surface must never return — lib's
own test_reconcile/test_arcticdb still cover the substrate). Full data
suite 1375 passing.

Added WAVE4_SLIM_DELETION_RUNBOOK.md — the gated, manual S3 prefix
deletion (byte-equal backup -> aws s3 rm, Wave-5 precedent). NOT
executed by CI or this PR.

DRAFT until 2026-05-23 Saturday-SF parity confirms slim<->ArcticDB
equivalence across breadth/compute/exit_timing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(runbook) + test(slim-guard): consumer-side soak gate + .venv exclusion fix

- WAVE4_SLIM_DELETION_RUNBOOK.md Gate #2 rewritten: the original
  "≥1 clean Saturday-SF parity observation" was made unreachable by the
  by-design adjustment-policy mismatch (slim raw close vs ArcticDB
  auto-adjusted close), and the `passed` flag in
  alpha_engine_lib/reconcile.py:62-68 requires `n_cells_over_epsilon == 0`
  which dividend-scale deltas cannot satisfy at any reasonable epsilon
  (even data #305's 1e-2). The actual safety case is 6 weeks of
  consumer-side ArcticDB-primary production soak since the 2026-04-14
  cutover with no correctness alerts. Backup-prefix date bumped
  260523 → 260525.

- tests/test_wave4_slim_arctic_parity.py: `_production_py_files()` was
  rglob'ing the entire repo including .venv/, build/, dist/, .git/.
  Hit on .venv joblib test fixture (big5-encoded) →
  UnicodeDecodeError before any actual production-code scan could run.
  Added _EXCLUDED_PREFIXES tuple. Suite 1484 passing post-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant