Skip to content

feat(wave3): price_cache_read_prefixes helper + sf_preflight reader migration + sector_map.json write-both gap fix (ROADMAP L1401)#272

Merged
cipher813 merged 2 commits into
mainfrom
feat/wave3-data-readers
May 19, 2026
Merged

feat(wave3): price_cache_read_prefixes helper + sf_preflight reader migration + sector_map.json write-both gap fix (ROADMAP L1401)#272
cipher813 merged 2 commits into
mainfrom
feat/wave3-data-readers

Conversation

@cipher813
Copy link
Copy Markdown
Owner

ROADMAP: L1401 — predictor/ S3 namespace rationalization Wave 3 — data-repo half of the PR3+ reader migrations + producer-side sector_map.json gap fix.

Three things in one PR

1. Read-side helper (new)

builders/_price_cache_writeboth.price_cache_read_prefixes() — companion to the existing price_cache_write_prefixes. Read order is WRITE order REVERSED: [reference/, predictor/] so the new prefix is consulted first, legacy is the soak-window fallback. PR4 cutover edits both helpers in one lockstep one-line change each.

2. sf_preflight backfill-source-freshness check

Migrated to iterate the helper instead of hardcoding the legacy key. Behavior change: a missing legacy key alone no longer fails the check (post-PR4 state); only when BOTH prefixes are unreadable does it report fail.

3. collectors/constituents.py — sector_map.json write-both gap fix (caught by audit)

sector_map.json is now written to 3 keys: data/ + predictor/price_cache/ + (NEW) reference/price_cache/. PR1 #270's write-both helper scoped only the ticker-parquet writes (yfinance / FRED / chronic-gap); the separately-emitted sector_map.json from this collector wasn't reaching the new prefix on an ongoing basis. The 2026-05-19 22:13Z one-shot backfill made it fresh at that moment — without this fix it would have gone stale after the next weekly producer run, and the read-new-first contract (already shipped in alpha-engine-backtester#230) would silently serve stale data.

Sites NOT migrated in this PR (deferred to a focused follow-up)

The remaining 3 reader sites are more invasive and warrant their own PR now that the helper is in place:

Site Why deferred
builders/backfill.py (_load_full_cache) LIST + bulk read across the full 10y tree; needs a careful list-from-new + add-legacy-only-extras pattern
builders/daily_append.py PRICE_CACHE_PREFIX constant threaded through many per-ticker GETs; mechanical but wide
collectors/slim_cache.py default-arg propagation; single touch but cross-cuts the slim Wave-4 retirement path — needs sequencing

Each remaining site should iterate price_cache_read_prefixes() and break on first hit (mirrors sf_preflight here).

Audit miss caught

collectors/constituents.py:114 was initially listed as a Wave-3 reader site (per the L1401 "data ×5" list) but is actually a writer site — the ROADMAP entry was mislabeled. Catching this surfaced the missing 3rd write target (the producer-side fix in §3 above). Net result: better than the original scope — the read-new-first contract in alpha-engine-backtester#230 is now safe.

Tests (+4 new, suite 1387 → 1391 green)

  • test_read_helper_default_returns_new_first_legacy_second — pins read order.
  • test_read_helper_explicit_legacy_returns_new_first_legacy_second — explicit-legacy default semantic.
  • test_read_helper_custom_prefix_returns_single — test/override opt-out (mirrors write-side).
  • test_sector_map_writes_to_all_three_paths — locks the 3-key write contract with byte-equal bodies.

Composes with

  • alpha-engine-data#270 (producer write-both, this PR's prerequisite).
  • alpha-engine#197 (IAM ARN add for reference/price_cache/).
  • alpha-engine-backtester#230 (Wave-3 backtester reader, sibling PR — and the consumer that needed §3's fix).
  • Wave-3 PR4 cutover (drops the fallback branch in both helpers).

🤖 Generated with Claude Code

…igration + sector_map.json write-both gap fix (ROADMAP L1401)

Wave-3 reader-side follow-on to producer write-both PR1
(alpha-engine-data#270, shipped 2026-05-19). Adds the read-side
companion helper, migrates the simplest reader site (sf_preflight),
and patches a PR1 gap: sector_map.json was only being written to
`data/` + `predictor/price_cache/` — readers that hit
`reference/price_cache/sector_map.json` first (e.g.
alpha-engine-backtester#230) would see a stale snapshot after PR4
deletes legacy.

Added
- `builders/_price_cache_writeboth.price_cache_read_prefixes()` —
  companion to `price_cache_write_prefixes`. Read order = WRITE
  order REVERSED: `[reference/, predictor/]` so the new prefix is
  consulted first, legacy is the soak-window fallback. PR4 cutover
  edits the helper + the write helper in lockstep.
- Re-exported alongside the existing helper.

Reader migrated
- `sf_preflight.py` backfill-source-freshness check — iterates the
  two prefixes via the helper instead of hardcoding the legacy key.
  Net behavior change: a missing legacy key alone no longer fails
  the check (post-PR4 state); only when BOTH prefixes are unreadable
  does the check report `fail`.

Producer-side gap fix
- `collectors/constituents.py` — sector_map.json now written to
  3 keys: `data/` + `predictor/price_cache/` + (NEW)
  `reference/price_cache/`. PR1 #270's write-both helper scoped only
  the ticker-parquet writes (yfinance / FRED / chronic-gap), so the
  separately-emitted sector_map.json from this collector wasn't
  reaching the new prefix on an ongoing basis. The one-shot backfill
  2026-05-19 22:13Z made `reference/price_cache/sector_map.json`
  fresh as of that moment; without this fix it would have gone stale
  after the next weekly producer run.

Sites NOT migrated in this PR (audit/scoping)
- `builders/backfill.py` (`_load_full_cache` — LIST + bulk read)
- `builders/daily_append.py` (`PRICE_CACHE_PREFIX` constant + many
  per-ticker GETs)
- `collectors/slim_cache.py` (default-arg propagation)
- More invasive; warrant a focused follow-up PR with the helper
  already in place. Each site should iterate
  `price_cache_read_prefixes()` and break on first hit.

Audit miss caught
- `collectors/constituents.py:114` initially read as a Wave-3
  reader site (per the L1401 list) but is actually a WRITER site —
  the L1401 entry was mislabeled. The catch surfaced the missing
  3rd write target above; net better state than the original scope.

Tests (+4 new, suite 1387 → 1391 green)
- `test_read_helper_default_returns_new_first_legacy_second` —
  pins new-first read order.
- `test_read_helper_explicit_legacy_returns_new_first_legacy_second`
  — explicit-legacy is identical to default (production semantic).
- `test_read_helper_custom_prefix_returns_single` — test/override
  opt-out matches the write-side semantic.
- `test_sector_map_writes_to_all_three_paths` — pins the 3-key
  write contract on `constituents.collect()`; byte-equal bodies.

Composes with
- alpha-engine-data#270 (producer write-both, this PR's prerequisite).
- alpha-engine#197 (IAM ARN add for `reference/price_cache/`).
- alpha-engine-backtester#230 (Wave-3 backtester reader, sibling PR).
- Wave-3 PR4 cutover (drops the fallback branch in both helpers; the
  one-line edit at that time).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit b7aa598 into main May 19, 2026
1 check passed
@cipher813 cipher813 deleted the feat/wave3-data-readers branch May 19, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant