Skip to content

feat(price-cache): Wave 3 PR1 producer write-both predictor/ → reference/#270

Merged
cipher813 merged 1 commit into
mainfrom
feat/wave3-pr1-producer-write-both
May 19, 2026
Merged

feat(price-cache): Wave 3 PR1 producer write-both predictor/ → reference/#270
cipher813 merged 1 commit into
mainfrom
feat/wave3-pr1-producer-write-both

Conversation

@cipher813
Copy link
Copy Markdown
Owner

ROADMAP P1 "predictor/ S3 namespace rationalization Wave 3" — start the write-both soak for the 10y price_cache parquet tree migration predictor/price_cache/reference/price_cache/. Mirrors Wave 1 (predictor/daily_closes/staging/daily_closes/) but uses write-both + soak instead of hard-cutover because this writer rewrites only stale tickers — a hard cut would leave fresh tickers in legacy + the new prefix incomplete for a full yfinance refresh cycle. CLAUDE.md S3 Contract Safety mandates write-both + ≥1 week soak.

What ships in PR1 (producer-side only — zero reader changes)

  • builders/_price_cache_writeboth.py (new): the single chokepoint. price_cache_write_prefixes(primary) returns [legacy, new] for the production default and [primary] for any custom string. Legacy ordered first so a fail-loud on the legacy write preserves pre-Wave-3 failure semantics — the new prefix never silently masks a legacy write error.
  • collectors/prices.py: yfinance refresh upload writes both prefixes.
  • collectors/fred_history.py: FRED backfill upload writes both prefixes.
  • weekly_collector.py: chronic-gap self-heal patch writes both prefixes (the get_object read stays on legacy since readers haven't migrated).
  • infrastructure/backfill_reference_price_cache.sh (new, +x): one-shot aws s3 sync operator script to seed reference/price_cache/ with the ~934 objects currently in predictor/price_cache/. Idempotent, --dry-run supported. Run ONCE as part of PR1's deploy.
  • tests/test_price_cache_writeboth.py (new, 7 tests): helper contract + each of the 3 production writers exercised end-to-end with stubbed s3 + recording asserts that BOTH keys land per ticker with identical bodies.
  • tests/test_fred_history_fetcher.py: updated the pre-existing test_uploads_to_s3_when_not_dry_run from asserting a single upload to write-both. Required by CLAUDE.md zero-tolerance test policy.

What does NOT ship in PR1

  • Reader migrations (~10 sites across 4 repos) — stay on legacy. PR3+ migrates with legacy fallback.
  • IAM expansion to cover reference/price_cache/* — PR2 in alpha-engine mirrors Wave 1 feat(iam): grant github-actions-lambda-deploy access to changelog/ prefix #120.
  • builders/daily_append.py:_load_parquet_warmup (reader, not writer) — migrates in PR3.
  • sector_map.json — separate write-once-per-Saturday concern, not part of the stale-ticker churn. Handled at cutover or PR3.
  • Cutover itself: PR4 will flip primary → reference/, drop legacy from price_cache_write_prefixes, retire reader fallbacks, aws s3 rm legacy prefix. Gated on ≥1 week of clean write-both observation.

Soak contract

PR1 merge → deploy live → run the backfill script ONCE to seed → next Saturday SF firing's first write-to-both starts the soak clock → after ≥4 Saturday firings (matches Wave 4 discipline) with no parity divergence, PR3 reader migrations land, then PR4 cutover.

Operational deploy step (post-merge)

bash infrastructure/backfill_reference_price_cache.sh

(Idempotent — safe to re-run; aws s3 sync skips matched objects.)

Tests

pytest tests/ -q
→ 1387 passed, 1 skipped, 0 failed

Composes with the in-flight Wave 4 slim-deletion arc (institutional pattern for data-tier prefix changes — dual-read/write + observation), Wave 1 PR #112 (template), and CLAUDE.md S3 Contract Safety.

🤖 Generated with Claude Code

…ence/

ROADMAP P1 "predictor/ S3 namespace rationalization Wave 3" — start the
write-both soak that migrates the 10y price_cache parquet tree from
predictor/price_cache/ (under the predictor module's namespace) to
reference/price_cache/ (long-lived data-module references). Mirrors the
shape of Wave 1's predictor/daily_closes/ -> staging/daily_closes/ but
uses write-both + soak instead of hard-cutover because this writer only
rewrites STALE tickers — a hard cut would leave fresh tickers in legacy
and the new prefix incomplete for a full yfinance refresh cycle. CLAUDE.md
S3 Contract Safety mandates the write-both + >=1 week soak for any path
change of this shape.

## What ships in PR1 (producer-side only — zero reader changes)

- builders/_price_cache_writeboth.py (new): the single chokepoint.
  `price_cache_write_prefixes(primary)` returns [legacy, new] for the
  production default and [primary] for any custom string. Legacy ordered
  first so a fail-loud on the legacy write preserves pre-Wave-3 failure
  semantics — the new prefix never silently masks a legacy write error.
- collectors/prices.py: yfinance refresh upload now writes both prefixes.
- collectors/fred_history.py: FRED backfill upload now writes both prefixes.
- weekly_collector.py: chronic-gap self-heal patch writes both prefixes
  (the get_object read stays on legacy since readers haven't migrated).
- infrastructure/backfill_reference_price_cache.sh (new): one-shot
  `aws s3 sync` operator script to seed reference/price_cache/ with the
  ~934 objects currently in predictor/price_cache/. Idempotent; --dry-run
  supported. Run ONCE as part of PR1's deploy.
- tests/test_price_cache_writeboth.py (new, 7 tests): helper contract
  (legacy default returns both, custom returns single, ordering pinned)
  + each of the 3 production writers exercised end-to-end with stubbed
  s3 + recording asserts that BOTH keys land per ticker with identical
  bodies.
- tests/test_fred_history_fetcher.py: updated the pre-existing
  test_uploads_to_s3_when_not_dry_run from asserting a single upload to
  asserting write-both behavior. Required by zero-tolerance test policy.

## What does NOT ship in PR1

- Reader migrations: ~10 read sites across alpha-engine-data,
  alpha-engine-predictor, alpha-engine-backtester, alpha-engine-dashboard
  stay on the legacy prefix. PR3+ migrates them with legacy fallback.
- IAM grant expansion to cover reference/price_cache/* — PR2 mirrors
  Wave 1 #120's IAM pattern on the alpha-engine repo's
  alpha-engine-s3-access.json.
- builders/daily_append.py:_load_parquet_warmup (reader, not writer) —
  migrates in PR3.
- sector_map.json (separate concern — write-once-per-Saturday, not part
  of the stale-ticker churn). Handled at cutover or PR3.
- The cutover itself: PR4 will flip primary -> reference/, drop the
  legacy entry from price_cache_write_prefixes, retire reader fallbacks,
  and `aws s3 rm --recursive` the legacy prefix. Gated on >=1 week of
  clean write-both observation.

## Soak contract

PR1 merge -> deploy this commit live -> run the backfill script ONCE to
seed the new prefix -> next Saturday SF firing's first write to both
prefixes starts the soak clock -> after >=4 Saturday firings (matches
Wave 4's discipline) with no parity divergence, PR3 reader migrations
go in, then PR4 cutover.

## Tests

  pytest tests/ -q  -> 1387 passed, 1 skipped, 0 failed

Composes with: ROADMAP Wave 4 slim-deletion arc currently in flight
(institutional pattern for data-tier prefix changes — dual-read /
dual-write + lib reconcile observation), Wave 1 PR #112 (template),
S3 Contract Safety in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 5f0099a into main May 19, 2026
1 check passed
@cipher813 cipher813 deleted the feat/wave3-pr1-producer-write-both branch May 19, 2026 22:11
cipher813 added a commit that referenced this pull request May 19, 2026
…igration + sector_map.json write-both gap fix (ROADMAP L1401) (#272)

Wave-3 reader-side follow-on to producer write-both PR1
(alpha-engine-data#270, shipped 2026-05-19). Adds the read-side
companion helper, migrates the simplest reader site (sf_preflight),
and patches a PR1 gap: sector_map.json was only being written to
`data/` + `predictor/price_cache/` — readers that hit
`reference/price_cache/sector_map.json` first (e.g.
alpha-engine-backtester#230) would see a stale snapshot after PR4
deletes legacy.

Added
- `builders/_price_cache_writeboth.price_cache_read_prefixes()` —
  companion to `price_cache_write_prefixes`. Read order = WRITE
  order REVERSED: `[reference/, predictor/]` so the new prefix is
  consulted first, legacy is the soak-window fallback. PR4 cutover
  edits the helper + the write helper in lockstep.
- Re-exported alongside the existing helper.

Reader migrated
- `sf_preflight.py` backfill-source-freshness check — iterates the
  two prefixes via the helper instead of hardcoding the legacy key.
  Net behavior change: a missing legacy key alone no longer fails
  the check (post-PR4 state); only when BOTH prefixes are unreadable
  does the check report `fail`.

Producer-side gap fix
- `collectors/constituents.py` — sector_map.json now written to
  3 keys: `data/` + `predictor/price_cache/` + (NEW)
  `reference/price_cache/`. PR1 #270's write-both helper scoped only
  the ticker-parquet writes (yfinance / FRED / chronic-gap), so the
  separately-emitted sector_map.json from this collector wasn't
  reaching the new prefix on an ongoing basis. The one-shot backfill
  2026-05-19 22:13Z made `reference/price_cache/sector_map.json`
  fresh as of that moment; without this fix it would have gone stale
  after the next weekly producer run.

Sites NOT migrated in this PR (audit/scoping)
- `builders/backfill.py` (`_load_full_cache` — LIST + bulk read)
- `builders/daily_append.py` (`PRICE_CACHE_PREFIX` constant + many
  per-ticker GETs)
- `collectors/slim_cache.py` (default-arg propagation)
- More invasive; warrant a focused follow-up PR with the helper
  already in place. Each site should iterate
  `price_cache_read_prefixes()` and break on first hit.

Audit miss caught
- `collectors/constituents.py:114` initially read as a Wave-3
  reader site (per the L1401 list) but is actually a WRITER site —
  the L1401 entry was mislabeled. The catch surfaced the missing
  3rd write target above; net better state than the original scope.

Tests (+4 new, suite 1387 → 1391 green)
- `test_read_helper_default_returns_new_first_legacy_second` —
  pins new-first read order.
- `test_read_helper_explicit_legacy_returns_new_first_legacy_second`
  — explicit-legacy is identical to default (production semantic).
- `test_read_helper_custom_prefix_returns_single` — test/override
  opt-out matches the write-side semantic.
- `test_sector_map_writes_to_all_three_paths` — pins the 3-key
  write contract on `constituents.collect()`; byte-equal bodies.

Composes with
- alpha-engine-data#270 (producer write-both, this PR's prerequisite).
- alpha-engine#197 (IAM ARN add for `reference/price_cache/`).
- alpha-engine-backtester#230 (Wave-3 backtester reader, sibling PR).
- Wave-3 PR4 cutover (drops the fallback branch in both helpers; the
  one-line edit at that time).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant