feat(price-cache): Wave 3 PR1 producer write-both predictor/ → reference/#270
Merged
Conversation
…ence/ ROADMAP P1 "predictor/ S3 namespace rationalization Wave 3" — start the write-both soak that migrates the 10y price_cache parquet tree from predictor/price_cache/ (under the predictor module's namespace) to reference/price_cache/ (long-lived data-module references). Mirrors the shape of Wave 1's predictor/daily_closes/ -> staging/daily_closes/ but uses write-both + soak instead of hard-cutover because this writer only rewrites STALE tickers — a hard cut would leave fresh tickers in legacy and the new prefix incomplete for a full yfinance refresh cycle. CLAUDE.md S3 Contract Safety mandates the write-both + >=1 week soak for any path change of this shape. ## What ships in PR1 (producer-side only — zero reader changes) - builders/_price_cache_writeboth.py (new): the single chokepoint. `price_cache_write_prefixes(primary)` returns [legacy, new] for the production default and [primary] for any custom string. Legacy ordered first so a fail-loud on the legacy write preserves pre-Wave-3 failure semantics — the new prefix never silently masks a legacy write error. - collectors/prices.py: yfinance refresh upload now writes both prefixes. - collectors/fred_history.py: FRED backfill upload now writes both prefixes. - weekly_collector.py: chronic-gap self-heal patch writes both prefixes (the get_object read stays on legacy since readers haven't migrated). - infrastructure/backfill_reference_price_cache.sh (new): one-shot `aws s3 sync` operator script to seed reference/price_cache/ with the ~934 objects currently in predictor/price_cache/. Idempotent; --dry-run supported. Run ONCE as part of PR1's deploy. - tests/test_price_cache_writeboth.py (new, 7 tests): helper contract (legacy default returns both, custom returns single, ordering pinned) + each of the 3 production writers exercised end-to-end with stubbed s3 + recording asserts that BOTH keys land per ticker with identical bodies. - tests/test_fred_history_fetcher.py: updated the pre-existing test_uploads_to_s3_when_not_dry_run from asserting a single upload to asserting write-both behavior. Required by zero-tolerance test policy. ## What does NOT ship in PR1 - Reader migrations: ~10 read sites across alpha-engine-data, alpha-engine-predictor, alpha-engine-backtester, alpha-engine-dashboard stay on the legacy prefix. PR3+ migrates them with legacy fallback. - IAM grant expansion to cover reference/price_cache/* — PR2 mirrors Wave 1 #120's IAM pattern on the alpha-engine repo's alpha-engine-s3-access.json. - builders/daily_append.py:_load_parquet_warmup (reader, not writer) — migrates in PR3. - sector_map.json (separate concern — write-once-per-Saturday, not part of the stale-ticker churn). Handled at cutover or PR3. - The cutover itself: PR4 will flip primary -> reference/, drop the legacy entry from price_cache_write_prefixes, retire reader fallbacks, and `aws s3 rm --recursive` the legacy prefix. Gated on >=1 week of clean write-both observation. ## Soak contract PR1 merge -> deploy this commit live -> run the backfill script ONCE to seed the new prefix -> next Saturday SF firing's first write to both prefixes starts the soak clock -> after >=4 Saturday firings (matches Wave 4's discipline) with no parity divergence, PR3 reader migrations go in, then PR4 cutover. ## Tests pytest tests/ -q -> 1387 passed, 1 skipped, 0 failed Composes with: ROADMAP Wave 4 slim-deletion arc currently in flight (institutional pattern for data-tier prefix changes — dual-read / dual-write + lib reconcile observation), Wave 1 PR #112 (template), S3 Contract Safety in CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 19, 2026
cipher813
added a commit
that referenced
this pull request
May 19, 2026
…igration + sector_map.json write-both gap fix (ROADMAP L1401) (#272) Wave-3 reader-side follow-on to producer write-both PR1 (alpha-engine-data#270, shipped 2026-05-19). Adds the read-side companion helper, migrates the simplest reader site (sf_preflight), and patches a PR1 gap: sector_map.json was only being written to `data/` + `predictor/price_cache/` — readers that hit `reference/price_cache/sector_map.json` first (e.g. alpha-engine-backtester#230) would see a stale snapshot after PR4 deletes legacy. Added - `builders/_price_cache_writeboth.price_cache_read_prefixes()` — companion to `price_cache_write_prefixes`. Read order = WRITE order REVERSED: `[reference/, predictor/]` so the new prefix is consulted first, legacy is the soak-window fallback. PR4 cutover edits the helper + the write helper in lockstep. - Re-exported alongside the existing helper. Reader migrated - `sf_preflight.py` backfill-source-freshness check — iterates the two prefixes via the helper instead of hardcoding the legacy key. Net behavior change: a missing legacy key alone no longer fails the check (post-PR4 state); only when BOTH prefixes are unreadable does the check report `fail`. Producer-side gap fix - `collectors/constituents.py` — sector_map.json now written to 3 keys: `data/` + `predictor/price_cache/` + (NEW) `reference/price_cache/`. PR1 #270's write-both helper scoped only the ticker-parquet writes (yfinance / FRED / chronic-gap), so the separately-emitted sector_map.json from this collector wasn't reaching the new prefix on an ongoing basis. The one-shot backfill 2026-05-19 22:13Z made `reference/price_cache/sector_map.json` fresh as of that moment; without this fix it would have gone stale after the next weekly producer run. Sites NOT migrated in this PR (audit/scoping) - `builders/backfill.py` (`_load_full_cache` — LIST + bulk read) - `builders/daily_append.py` (`PRICE_CACHE_PREFIX` constant + many per-ticker GETs) - `collectors/slim_cache.py` (default-arg propagation) - More invasive; warrant a focused follow-up PR with the helper already in place. Each site should iterate `price_cache_read_prefixes()` and break on first hit. Audit miss caught - `collectors/constituents.py:114` initially read as a Wave-3 reader site (per the L1401 list) but is actually a WRITER site — the L1401 entry was mislabeled. The catch surfaced the missing 3rd write target above; net better state than the original scope. Tests (+4 new, suite 1387 → 1391 green) - `test_read_helper_default_returns_new_first_legacy_second` — pins new-first read order. - `test_read_helper_explicit_legacy_returns_new_first_legacy_second` — explicit-legacy is identical to default (production semantic). - `test_read_helper_custom_prefix_returns_single` — test/override opt-out matches the write-side semantic. - `test_sector_map_writes_to_all_three_paths` — pins the 3-key write contract on `constituents.collect()`; byte-equal bodies. Composes with - alpha-engine-data#270 (producer write-both, this PR's prerequisite). - alpha-engine#197 (IAM ARN add for `reference/price_cache/`). - alpha-engine-backtester#230 (Wave-3 backtester reader, sibling PR). - Wave-3 PR4 cutover (drops the fallback branch in both helpers; the one-line edit at that time). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ROADMAP P1 "
predictor/S3 namespace rationalization Wave 3" — start the write-both soak for the 10y price_cache parquet tree migrationpredictor/price_cache/→reference/price_cache/. Mirrors Wave 1 (predictor/daily_closes/→staging/daily_closes/) but uses write-both + soak instead of hard-cutover because this writer rewrites only stale tickers — a hard cut would leave fresh tickers in legacy + the new prefix incomplete for a full yfinance refresh cycle. CLAUDE.md S3 Contract Safety mandates write-both + ≥1 week soak.What ships in PR1 (producer-side only — zero reader changes)
builders/_price_cache_writeboth.py(new): the single chokepoint.price_cache_write_prefixes(primary)returns[legacy, new]for the production default and[primary]for any custom string. Legacy ordered first so a fail-loud on the legacy write preserves pre-Wave-3 failure semantics — the new prefix never silently masks a legacy write error.collectors/prices.py: yfinance refresh upload writes both prefixes.collectors/fred_history.py: FRED backfill upload writes both prefixes.weekly_collector.py: chronic-gap self-heal patch writes both prefixes (theget_objectread stays on legacy since readers haven't migrated).infrastructure/backfill_reference_price_cache.sh(new, +x): one-shotaws s3 syncoperator script to seedreference/price_cache/with the ~934 objects currently inpredictor/price_cache/. Idempotent,--dry-runsupported. Run ONCE as part of PR1's deploy.tests/test_price_cache_writeboth.py(new, 7 tests): helper contract + each of the 3 production writers exercised end-to-end with stubbed s3 + recording asserts that BOTH keys land per ticker with identical bodies.tests/test_fred_history_fetcher.py: updated the pre-existingtest_uploads_to_s3_when_not_dry_runfrom asserting a single upload to write-both. Required by CLAUDE.md zero-tolerance test policy.What does NOT ship in PR1
reference/price_cache/*— PR2 inalpha-enginemirrors Wave 1 feat(iam): grant github-actions-lambda-deploy access to changelog/ prefix #120.builders/daily_append.py:_load_parquet_warmup(reader, not writer) — migrates in PR3.sector_map.json— separate write-once-per-Saturday concern, not part of the stale-ticker churn. Handled at cutover or PR3.reference/, drop legacy fromprice_cache_write_prefixes, retire reader fallbacks,aws s3 rmlegacy prefix. Gated on ≥1 week of clean write-both observation.Soak contract
PR1 merge → deploy live → run the backfill script ONCE to seed → next Saturday SF firing's first write-to-both starts the soak clock → after ≥4 Saturday firings (matches Wave 4 discipline) with no parity divergence, PR3 reader migrations land, then PR4 cutover.
Operational deploy step (post-merge)
(Idempotent — safe to re-run;
aws s3 syncskips matched objects.)Tests
Composes with the in-flight Wave 4 slim-deletion arc (institutional pattern for data-tier prefix changes — dual-read/write + observation), Wave 1 PR #112 (template), and CLAUDE.md S3 Contract Safety.
🤖 Generated with Claude Code