Skip to content

feat(wave3): PR3-wave-2 invasive reader migrations — backfill + daily_append + slim_cache#273

Merged
cipher813 merged 1 commit into
mainfrom
feat/wave3-pr3-wave2-invasive-readers
May 20, 2026
Merged

feat(wave3): PR3-wave-2 invasive reader migrations — backfill + daily_append + slim_cache#273
cipher813 merged 1 commit into
mainfrom
feat/wave3-pr3-wave2-invasive-readers

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Completes the data-repo side of Wave-3 PR3 reader migration (ROADMAP L1401). PR3-wave-1 (#272) shipped the read-side helper price_cache_read_prefixes and the easy sf_preflight.py consumer; this PR is the wave-2 follow-up the L1401 entry explicitly carved out.

What's in this PR

Three sites named as "invasive" follow-ups in L1401's PR3-wave-1 banner now go through the Wave-3 read fallback chain:

  • builders/backfill._load_full_cache — swaps its hand-rolled paginator for the new list_price_cache_keys helper. New prefix consulted first; legacy fallback fills gaps during the soak; deduped by {ticker}.parquet basename so each ticker is fetched once.
  • builders/daily_append._load_parquet_warmup — iterates price_cache_read_prefixes for the single-key parquet lookup. NoSuchKey on the new prefix falls through to legacy; absent from BOTH prefixes still degrades to None (the PR fix(features): per-feature graceful degrade, no whole-row dropna #78 graceful path). Non-404 errors still hard-fail on the first prefix that raises — NoSilentFails preserved.
  • collectors/slim_cache.collect — switches its _list_parquets paginator to the same chokepoint. The now-unused private helper is deleted.

New chokepoint

list_price_cache_keys(s3, bucket, primary) is added to builders/_price_cache_writeboth.py so every aggregate-listing site goes through the same Wave-3 helper as the write-side path. Single-key callers continue to inline price_cache_read_prefixes (matches the predictor #181 pattern; the data repo's only single-key reader is _load_parquet_warmup above).

Tests

6 new behavioural tests following the established 3-test discipline:

  • list_price_cache_keys: prefers-new / falls-back-legacy / explicit-custom-opt-out
  • _load_parquet_warmup: prefers-new / falls-back-legacy / absent-in-both-degrades-to-None

Backfill is covered transitively through the helper's tests. The pre-existing _load_parquet_warmup error-path tests (404 / NoSuchKey / AccessDenied / empty-frame) still pass — they monkeypatch load_parquet_from_s3 which is the loop's per-prefix call.

Suite: 1387 → 1399 green (+12 net).

PR4 cutover

Post-soak (earliest ~2026-05-26 = 1 week after data #270's 5/19 producer write-both went live), the PR4 cutover drops the legacy entry from BOTH price_cache_write_prefixes and price_cache_read_prefixes in one edit; the PRICE_CACHE_PREFIX = PRICE_CACHE_LEGACY_PREFIX alias in daily_append.py gets deleted in the same edit. The tests added here flip to single-prefix expectations at that time.

Composes with

🤖 Generated with Claude Code

…_append + slim_cache (ROADMAP L1401)

Completes the data-repo side of Wave-3 PR3 reader migration. Three sites
called out as invasive in the L1401 PR3 follow-up plan now consult the
new `reference/price_cache/` prefix first and fall back to legacy
`predictor/price_cache/` during the soak — the PR4 cutover then drops
the legacy entry in a one-line edit of `_price_cache_writeboth.py`.

- New helper `list_price_cache_keys(s3, bucket, primary)` extends the
  Wave-3 chokepoint to aggregate-list sites; iterates the read-prefix
  fallback chain and deduplicates by `{ticker}.parquet` basename
  (first-prefix-wins). Custom prefixes opt out of the chain matching
  the existing single-key helper semantics.
- `builders/backfill._load_full_cache` swaps its hand-rolled paginator
  for `list_price_cache_keys`; the production default exercises the
  new prefix first, custom-prefix callers (tests / ad-hoc invocations)
  keep single-prefix listing.
- `builders/daily_append._load_parquet_warmup` iterates
  `price_cache_read_prefixes` for the single-key lookup. NoSuchKey on
  the new prefix falls through to legacy; absent from BOTH prefixes
  still degrades to None (the PR #78 graceful path). Non-404 errors
  still hard-fail on the first prefix that raises — NoSilentFails
  preserved.
- `collectors/slim_cache.collect` switches its `_list_parquets` paginator
  to the same chokepoint; the now-unused private helper is deleted.
- 6 new behavioural tests: `list_price_cache_keys` 3-test discipline
  (prefers-new / falls-back-legacy / explicit-custom-opt-out) +
  `_load_parquet_warmup` matching 3-test set. Backfill is covered
  transitively through the helper's tests. Suite 1387 → 1399 green.

PR4 cutover (post-soak, earliest ~2026-05-26) drops the legacy entry
from both `price_cache_write_prefixes` and `price_cache_read_prefixes`
in one edit; the `PRICE_CACHE_PREFIX = PRICE_CACHE_LEGACY_PREFIX` alias
in `daily_append.py` gets deleted in the same edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 7433c64 into main May 20, 2026
1 check passed
@cipher813 cipher813 deleted the feat/wave3-pr3-wave2-invasive-readers branch May 20, 2026 00:22
cipher813 added a commit that referenced this pull request May 20, 2026
…y rollback chain since PR #254 (#274)

The Phase 2 Lambda CI deploy has been failing on every push since
2026-05-18 18:20Z. PR #254 (per-collector value-range validation,
merged 5/16) added top-level imports

  from validators.price_validator import (...)

to collectors/alternative.py + collectors/fundamentals.py, but did NOT
add `COPY validators/` to the Dockerfile. Every push that touched a
deploy.yml-triggering path since then built a fresh image, pushed v88,
ran the canary, hit `No module named 'validators'` at module load,
and rolled back to v87. 10 consecutive failed deploys (5/18-18:20Z
through 5/20-00:25Z) sat unnoticed because the canary correctly rolled
back so prod (v87) was unaffected — but the latent break blocked ANY
new code from ever reaching `live`. Surfaced by the Wave-3 PR3-wave-2
deploy (#273) when Brian saw the rollback in the CI log.

Fix: one-line `COPY validators/ ${LAMBDA_TASK_ROOT}/validators/` next
to the other application-code COPY directives.

Regression guard: new tests/test_dockerfile_copies_match_deployed
_imports.py with two assertions —

1. `test_dockerfile_copies_validators_for_collectors_imports` —
   explicit pin for the validators/ case so the failure surfaces as
   a clear test name if a future refactor drops the COPY.

2. `test_every_toplevel_local_import_in_lambda_code_is_dockerfile
   _copied` — generic ast-scan over every deployed Python file
   (lambda/, weekly_collector.py, polygon_client.py, collectors/,
   store/, validators/). Module-scope `from <local_pkg> import ...`
   / `import <local_pkg>` where <local_pkg> is a directory with
   __init__.py at the repo root MUST appear in the Dockerfile's
   COPY directives — or in the explicitly-non-deployed allowlist
   (`tests`, `builders`, `infrastructure`, `rag`, `features`). New
   top-level imports of un-COPY'd packages fail in CI now, not in
   the post-merge canary.

Suite 1399 → 1401 green.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant