chore(canonical-key): retire substrate-reader legacy fallback branches (ROADMAP L178)#206
Merged
Merged
Conversation
…s (ROADMAP L178)
Companion to alpha-engine-data #271. Wave-1 canonical-shape gate fired
2026-05-18: `latest.json` sidecars produced 5/16 ~08:17–08:20 UTC for all
three substrates; zero `legacy` log emissions in the research-runner
Lambda over the last 7 days. Safe to drop the fallback branches.
Removed
- `read_news_aggregates` + `read_analyst_revisions` — legacy
`{as_of_date}.parquet` fallback branch + now-unused `as_of_date`
positional arg.
- `read_insider_transactions_window` — legacy per-filed_date walk-back
loop (`window_days+1` GETs against `{date}.parquet` keys); now a
single canonical sidecar read + window filter on `filed_date`.
- Stale docstring note on `_read_via_latest` ("logged at INFO so
legacy-fallback callers can degrade").
Updated
- `SubstrateReader.snapshot_for_ticker` + `read_substrate_for_population`
internal call sites — drop `as_of_date` arg from the two news/analyst
readers (it was only for the legacy fallback).
- Package docstring — canonical-shape S3 layout is the only documented
shape now.
Tests
- 4 round-trip / rollup tests migrated from legacy-keyed plants to a new
`_put_canonical` helper (mirrors producer's run_id + latest.json shape).
- Renamed `test_falls_back_to_legacy_date_key_when_no_sidecar` →
`test_legacy_date_key_is_ignored_when_no_sidecar` and flipped its
assertion (legacy key alone → empty result). Net regression guard.
- Added `test_insider_legacy_per_date_keys_are_ignored` — planting two
legacy per-filed_date parquets with no canonical sidecar must yield
empty (no walk-back).
- Tightened the read-budget pin from "~95 calls" to "exactly 3" in
`test_only_one_s3_read_per_parquet_for_whole_population`.
Suite: **1373 passed** (+1 regression guard), 45.6s.
Public API of `data.substrate` (`SubstrateReader`, `SubstrateSnapshot`,
`read_substrate_for_population`) unchanged in shape — gitignored
`fetch_data` callers don't need a coordinated edit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ROADMAP: L178 — Retire canonical-key backward-compat shims (Wave-1 cleanup, P2)
Companion to alpha-engine-data #271.
Gate fired 2026-05-18. Canonical
latest.jsonsidecars produced for all three substrates on 2026-05-16 ~08:17–08:20 UTC; zerolegacylog emissions in the research-runner Lambda over the last 7 days.Removed
data/substrate/reader.py::read_news_aggregates+read_analyst_revisions— legacy{as_of_date}.parquetfallback + now-unusedas_of_datepositional arg.read_insider_transactions_window— legacy per-filed_date walk-back loop (waswindow_days+1GETs against{date}.parquetkeys); now a single canonical sidecar read + window filter onfiled_date._read_via_latest("so legacy-fallback callers can degrade").Internal call sites updated
SubstrateReader.snapshot_for_ticker+read_substrate_for_population— dropas_of_datearg from the two news/analyst readers (it was only for the retired legacy fallback).as_of_datecontinues to flow toread_insider_transactions_windowfor window filtering (a legitimate use).Tests
Existing tests were planting legacy
{date}.parquetfiles and expecting them to be read — that was the canonical contract this PR retires. Re-shaped to canonical fixtures:_put_canonical(s3, prefix, df, run_id)helper that mirrors the producer's run_id + latest.json shape._put_canonical.test_falls_back_to_legacy_date_key_when_no_sidecar→test_legacy_date_key_is_ignored_when_no_sidecarand flipped its assertion (legacy key alone → empty). Net regression guard.test_insider_legacy_per_date_keys_are_ignored— two legacy per-filed_date parquets with no canonical sidecar must yield empty (no walk-back).test_only_one_s3_read_per_parquet_for_whole_populationfrom "≤100 calls" to "== 3".Suite: 1373 passed (+1 regression guard), 45.6s.
Public API unchanged in shape
data.substrateexports —SubstrateReader,SubstrateSnapshot,read_substrate_for_population— keep identical signatures. Gitignoredfetch_datacallers don't need a coordinated edit.Composes with
🤖 Generated with Claude Code