Skip to content

chore(canonical-key): retire substrate-reader legacy fallback branches (ROADMAP L178)#206

Merged
cipher813 merged 1 commit into
mainfrom
chore/retire-substrate-reader-fallback
May 19, 2026
Merged

chore(canonical-key): retire substrate-reader legacy fallback branches (ROADMAP L178)#206
cipher813 merged 1 commit into
mainfrom
chore/retire-substrate-reader-fallback

Conversation

@cipher813
Copy link
Copy Markdown
Owner

ROADMAP: L178 — Retire canonical-key backward-compat shims (Wave-1 cleanup, P2)

Companion to alpha-engine-data #271.

Gate fired 2026-05-18. Canonical latest.json sidecars produced for all three substrates on 2026-05-16 ~08:17–08:20 UTC; zero legacy log emissions in the research-runner Lambda over the last 7 days.

Removed

  • data/substrate/reader.py::read_news_aggregates + read_analyst_revisions — legacy {as_of_date}.parquet fallback + now-unused as_of_date positional arg.
  • read_insider_transactions_window — legacy per-filed_date walk-back loop (was window_days+1 GETs against {date}.parquet keys); now a single canonical sidecar read + window filter on filed_date.
  • Stale docstring note on _read_via_latest ("so legacy-fallback callers can degrade").

Internal call sites updated

  • SubstrateReader.snapshot_for_ticker + read_substrate_for_population — drop as_of_date arg from the two news/analyst readers (it was only for the retired legacy fallback). as_of_date continues to flow to read_insider_transactions_window for window filtering (a legitimate use).

Tests

Existing tests were planting legacy {date}.parquet files and expecting them to be read — that was the canonical contract this PR retires. Re-shaped to canonical fixtures:

  • Added _put_canonical(s3, prefix, df, run_id) helper that mirrors the producer's run_id + latest.json shape.
  • 4 round-trip / rollup tests migrated to _put_canonical.
  • Renamed test_falls_back_to_legacy_date_key_when_no_sidecartest_legacy_date_key_is_ignored_when_no_sidecar and flipped its assertion (legacy key alone → empty). Net regression guard.
  • Added test_insider_legacy_per_date_keys_are_ignored — two legacy per-filed_date parquets with no canonical sidecar must yield empty (no walk-back).
  • Tightened read-budget pin in test_only_one_s3_read_per_parquet_for_whole_population from "≤100 calls" to "== 3".

Suite: 1373 passed (+1 regression guard), 45.6s.

Public API unchanged in shape

data.substrate exports — SubstrateReader, SubstrateSnapshot, read_substrate_for_population — keep identical signatures. Gitignored fetch_data callers don't need a coordinated edit.

Composes with

🤖 Generated with Claude Code

…s (ROADMAP L178)

Companion to alpha-engine-data #271. Wave-1 canonical-shape gate fired
2026-05-18: `latest.json` sidecars produced 5/16 ~08:17–08:20 UTC for all
three substrates; zero `legacy` log emissions in the research-runner
Lambda over the last 7 days. Safe to drop the fallback branches.

Removed
- `read_news_aggregates` + `read_analyst_revisions` — legacy
  `{as_of_date}.parquet` fallback branch + now-unused `as_of_date`
  positional arg.
- `read_insider_transactions_window` — legacy per-filed_date walk-back
  loop (`window_days+1` GETs against `{date}.parquet` keys); now a
  single canonical sidecar read + window filter on `filed_date`.
- Stale docstring note on `_read_via_latest` ("logged at INFO so
  legacy-fallback callers can degrade").

Updated
- `SubstrateReader.snapshot_for_ticker` + `read_substrate_for_population`
  internal call sites — drop `as_of_date` arg from the two news/analyst
  readers (it was only for the legacy fallback).
- Package docstring — canonical-shape S3 layout is the only documented
  shape now.

Tests
- 4 round-trip / rollup tests migrated from legacy-keyed plants to a new
  `_put_canonical` helper (mirrors producer's run_id + latest.json shape).
- Renamed `test_falls_back_to_legacy_date_key_when_no_sidecar` →
  `test_legacy_date_key_is_ignored_when_no_sidecar` and flipped its
  assertion (legacy key alone → empty result). Net regression guard.
- Added `test_insider_legacy_per_date_keys_are_ignored` — planting two
  legacy per-filed_date parquets with no canonical sidecar must yield
  empty (no walk-back).
- Tightened the read-budget pin from "~95 calls" to "exactly 3" in
  `test_only_one_s3_read_per_parquet_for_whole_population`.

Suite: **1373 passed** (+1 regression guard), 45.6s.

Public API of `data.substrate` (`SubstrateReader`, `SubstrateSnapshot`,
`read_substrate_for_population`) unchanged in shape — gitignored
`fetch_data` callers don't need a coordinated edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 527cca8 into main May 19, 2026
1 check passed
@cipher813 cipher813 deleted the chore/retire-substrate-reader-fallback branch May 19, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant