feat(wave3): PR3-wave-2 invasive reader migrations — backfill + daily_append + slim_cache#273
Merged
Merged
Conversation
…_append + slim_cache (ROADMAP L1401)
Completes the data-repo side of Wave-3 PR3 reader migration. Three sites
called out as invasive in the L1401 PR3 follow-up plan now consult the
new `reference/price_cache/` prefix first and fall back to legacy
`predictor/price_cache/` during the soak — the PR4 cutover then drops
the legacy entry in a one-line edit of `_price_cache_writeboth.py`.
- New helper `list_price_cache_keys(s3, bucket, primary)` extends the
Wave-3 chokepoint to aggregate-list sites; iterates the read-prefix
fallback chain and deduplicates by `{ticker}.parquet` basename
(first-prefix-wins). Custom prefixes opt out of the chain matching
the existing single-key helper semantics.
- `builders/backfill._load_full_cache` swaps its hand-rolled paginator
for `list_price_cache_keys`; the production default exercises the
new prefix first, custom-prefix callers (tests / ad-hoc invocations)
keep single-prefix listing.
- `builders/daily_append._load_parquet_warmup` iterates
`price_cache_read_prefixes` for the single-key lookup. NoSuchKey on
the new prefix falls through to legacy; absent from BOTH prefixes
still degrades to None (the PR #78 graceful path). Non-404 errors
still hard-fail on the first prefix that raises — NoSilentFails
preserved.
- `collectors/slim_cache.collect` switches its `_list_parquets` paginator
to the same chokepoint; the now-unused private helper is deleted.
- 6 new behavioural tests: `list_price_cache_keys` 3-test discipline
(prefers-new / falls-back-legacy / explicit-custom-opt-out) +
`_load_parquet_warmup` matching 3-test set. Backfill is covered
transitively through the helper's tests. Suite 1387 → 1399 green.
PR4 cutover (post-soak, earliest ~2026-05-26) drops the legacy entry
from both `price_cache_write_prefixes` and `price_cache_read_prefixes`
in one edit; the `PRICE_CACHE_PREFIX = PRICE_CACHE_LEGACY_PREFIX` alias
in `daily_append.py` gets deleted in the same edit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 20, 2026
…y rollback chain since PR #254 (#274) The Phase 2 Lambda CI deploy has been failing on every push since 2026-05-18 18:20Z. PR #254 (per-collector value-range validation, merged 5/16) added top-level imports from validators.price_validator import (...) to collectors/alternative.py + collectors/fundamentals.py, but did NOT add `COPY validators/` to the Dockerfile. Every push that touched a deploy.yml-triggering path since then built a fresh image, pushed v88, ran the canary, hit `No module named 'validators'` at module load, and rolled back to v87. 10 consecutive failed deploys (5/18-18:20Z through 5/20-00:25Z) sat unnoticed because the canary correctly rolled back so prod (v87) was unaffected — but the latent break blocked ANY new code from ever reaching `live`. Surfaced by the Wave-3 PR3-wave-2 deploy (#273) when Brian saw the rollback in the CI log. Fix: one-line `COPY validators/ ${LAMBDA_TASK_ROOT}/validators/` next to the other application-code COPY directives. Regression guard: new tests/test_dockerfile_copies_match_deployed _imports.py with two assertions — 1. `test_dockerfile_copies_validators_for_collectors_imports` — explicit pin for the validators/ case so the failure surfaces as a clear test name if a future refactor drops the COPY. 2. `test_every_toplevel_local_import_in_lambda_code_is_dockerfile _copied` — generic ast-scan over every deployed Python file (lambda/, weekly_collector.py, polygon_client.py, collectors/, store/, validators/). Module-scope `from <local_pkg> import ...` / `import <local_pkg>` where <local_pkg> is a directory with __init__.py at the repo root MUST appear in the Dockerfile's COPY directives — or in the explicitly-non-deployed allowlist (`tests`, `builders`, `infrastructure`, `rag`, `features`). New top-level imports of un-COPY'd packages fail in CI now, not in the post-merge canary. Suite 1399 → 1401 green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Completes the data-repo side of Wave-3 PR3 reader migration (ROADMAP L1401). PR3-wave-1 (#272) shipped the read-side helper
price_cache_read_prefixesand the easysf_preflight.pyconsumer; this PR is the wave-2 follow-up the L1401 entry explicitly carved out.What's in this PR
Three sites named as "invasive" follow-ups in L1401's PR3-wave-1 banner now go through the Wave-3 read fallback chain:
builders/backfill._load_full_cache— swaps its hand-rolled paginator for the newlist_price_cache_keyshelper. New prefix consulted first; legacy fallback fills gaps during the soak; deduped by{ticker}.parquetbasename so each ticker is fetched once.builders/daily_append._load_parquet_warmup— iteratesprice_cache_read_prefixesfor the single-key parquet lookup.NoSuchKeyon the new prefix falls through to legacy; absent from BOTH prefixes still degrades toNone(the PR fix(features): per-feature graceful degrade, no whole-row dropna #78 graceful path). Non-404 errors still hard-fail on the first prefix that raises — NoSilentFails preserved.collectors/slim_cache.collect— switches its_list_parquetspaginator to the same chokepoint. The now-unused private helper is deleted.New chokepoint
list_price_cache_keys(s3, bucket, primary)is added tobuilders/_price_cache_writeboth.pyso every aggregate-listing site goes through the same Wave-3 helper as the write-side path. Single-key callers continue to inlineprice_cache_read_prefixes(matches the predictor #181 pattern; the data repo's only single-key reader is_load_parquet_warmupabove).Tests
6 new behavioural tests following the established 3-test discipline:
list_price_cache_keys: prefers-new / falls-back-legacy / explicit-custom-opt-out_load_parquet_warmup: prefers-new / falls-back-legacy / absent-in-both-degrades-to-NoneBackfill is covered transitively through the helper's tests. The pre-existing
_load_parquet_warmuperror-path tests (404 / NoSuchKey / AccessDenied / empty-frame) still pass — they monkeypatchload_parquet_from_s3which is the loop's per-prefix call.Suite: 1387 → 1399 green (+12 net).
PR4 cutover
Post-soak (earliest ~2026-05-26 = 1 week after data #270's 5/19 producer write-both went live), the PR4 cutover drops the legacy entry from BOTH
price_cache_write_prefixesandprice_cache_read_prefixesin one edit; thePRICE_CACHE_PREFIX = PRICE_CACHE_LEGACY_PREFIXalias indaily_append.pygets deleted in the same edit. The tests added here flip to single-prefix expectations at that time.Composes with
price_cache_read_prefixeshelper +sf_preflight.py)[[reference-wave3-read-new-first-with-legacy-fallback-pattern]]memory🤖 Generated with Claude Code