Skip to content

fix(daily_closes): source-priority coalesce — never regress a cell to a less-informative value#354

Merged
cipher813 merged 1 commit into
mainfrom
fix/daily-closes-source-priority-coalesce-merge
Jun 1, 2026
Merged

fix(daily_closes): source-priority coalesce — never regress a cell to a less-informative value#354
cipher813 merged 1 commit into
mainfrom
fix/daily-closes-source-priority-coalesce-merge

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Why

The 2026-06-01 weekday Step Function failed at MorningEnrich. Root cause:

  • A FRED 429 rate-limit storm meant TNX (DGS10, the 10Y yield) was never collected for the 5/29 target date.
  • polygon_only mode overwrote the existing daily-closes parquet wholesale (canonical_existing_rows empty by construction). Polygon never serves ^TNX and yfinance was refused, so the rewrite blanked the TNX value the prior (Friday EOD) parquet already held.
  • daily_append correctly hard-failed on the missing critical macro key → pipeline halt.

This violated the 2026-05-10 decision ("a cell is only updated if the data exists in the authoritative source, else the prior datapoint is retained"). The retain mechanism existed (canonical_existing_rows) but was gated to yfinance_only/auto; polygon_only opted out and did a destructive overwrite.

SOTA / institutional alignment

This implements the source-of-record waterfall that institutional market-data masters use (provenance per cell + priority-ranked COALESCE upsert + bitemporal revisions-not-blanking + loud degradation events). Verified against that pattern before implementing.

What changed (collectors/daily_closes.py)

  • _coalesce_by_source_priority — for each ticker, keep the row from the highest-priority source across {prior parquet, this run}. _SOURCE_PRIORITY = {polygon:3, fred:3, yfinance:1} (polygon/fred are co-primary over disjoint domains — equities vs ^indices). Subsumes:
    • retain-on-empty — a ticker the live pass can't refresh keeps its prior row (fixes the TNX blank).
    • restatement wins — same-or-higher source overwrites (ties → fresh), so polygon's corporate-action-adjusted closes still land.
    • no source-downgrade — a yfinance backstop can't clobber a prior polygon close + true VWAP (prevents the 2026-04-17 VWAP=None contamination class).
    • null/NaN Close treated as missing (never wins, never written as an empty cell).
  • polygon_only now coalesces before write instead of destructive overwrite. Runs after the coverage gate (computed on the fresh fetch), so a genuine polygon outage still hard-fails and is not masked by retained rows.
  • Macro yfinance backstop — the equity-specific no-silent-fails refusal no longer blanket-blocks the FRED-index macro tickers (^TNX/^VIX/^IRX/^VIX3M). Polygon never serves them and they carry no VWAP, so FRED → yfinance is their legitimate chain. Loud WARN on fallback (records the FRED degradation; satisfies feedback_no_silent_fails). Equities still refuse yfinance.

no_silent_fails deviation rationale (per CLAUDE.md)

The macro yfinance backstop is a loud fallback, not a silent one: (a) failure mode = FRED rate-limit/outage on a FRED-only macro ticker; (b) primary deliverable (the parquet + macro keys) survives via the documented FRED→yfinance chain; (c) recording surface = explicit WARN log naming the ticker(s) + the polygon_only coalesce WARN counting retained/downgrade-blocked cells. Equity universe semantics are unchanged.

Tests

  • New tests/test_daily_closes_source_priority_merge.py — 9 tests on the coalesce invariant (retain-on-empty, restatement, downgrade-block, null handling, unknown-source prior, mixed-scenario stats).
  • tests/test_daily_closes_source_modes.py — added macro-backstop wiring test + end-to-end TNX-retain regression; tightened 2 docstrings that over-claimed the old destructive-overwrite contract.
  • Full suite: 1742 passed, 1 skipped.

Out-of-scope follow-ups (filing to ROADMAP)

  1. FRED rate-limit hardening — this PR makes the system resilient to FRED 429s but does not stop them. The windowed backfill fires ~N×2 FRED calls in a tight burst with no backoff/jitter. A FRED retry/backoff-with-jitter (SOTA retry primitive) is separate hardening.
  2. Waterfall consolidation (P3)polygon_only (coalesce) and yfinance_only/auto (canonical_existing_rows) are parallel preservation mechanisms; consider unifying on the priority-waterfall primitive.
  3. Operator note — today's 6/1 weekday SF did not trade (halted pre-MorningEnrich-completion); no recovery re-run fired (pre-open window had closed).

🤖 Generated with Claude Code

… a less-informative value

Root cause of the 2026-06-01 weekday SF failure: MorningEnrich's
`polygon_only` collect overwrote the existing daily-closes parquet
WHOLESALE (canonical_existing_rows empty by construction). A transient
FRED 429 storm meant `TNX` (DGS10) was never collected for the 5/29
target; polygon never serves ^TNX and yfinance was refused, so the
rewrite BLANKED the TNX value the prior (Friday EOD) parquet already
held. daily_append then hard-failed on the missing critical macro key
and halted the pipeline.

This violated Brian's 2026-05-10 decision ("a cell is only updated if
the data exists in the authoritative source, else the prior datapoint
is retained"). The mechanism existed (canonical_existing_rows) but was
gated to yfinance_only/auto; polygon_only opted out and did a
destructive overwrite.

Fix — institutional source-of-record waterfall at the collect() seam:

- `_coalesce_by_source_priority`: a cell is replaced only by an
  equal-or-higher-priority source (`_SOURCE_PRIORITY` polygon=fred=3,
  yfinance=1). Subsumes:
  * retain-on-empty — a ticker the live pass can't refresh keeps its
    prior row (fixes the TNX blank);
  * restatement wins — same/higher source overwrites (polygon
    corporate-action closes still land);
  * no source-downgrade — a yfinance backstop can't clobber a prior
    polygon close+VWAP (prevents the 2026-04-17 VWAP=None class).
- polygon_only now loads full prior rows and coalesces before write.
  Runs AFTER the coverage gate (computed on the FRESH fetch), so a real
  polygon outage still hard-fails and is not masked by retained rows.
- Macro yfinance backstop: the equity-specific no-silent-fails refusal
  no longer blanket-blocks the FRED-index macro tickers
  (^TNX/^VIX/^IRX/^VIX3M) — polygon never serves them and they carry no
  VWAP, so FRED -> yfinance is their legitimate chain. Loud WARN on
  fallback (records the FRED degradation; satisfies no_silent_fails).
  Equities still refuse yfinance.

Tests: new test_daily_closes_source_priority_merge.py (9, the coalesce
invariant) + 2 source_modes tests (macro backstop wiring + end-to-end
TNX retain regression). Tightened two docstrings that over-claimed the
old destructive-overwrite contract. Full suite 1742 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 569c089 into main Jun 1, 2026
3 checks passed
@cipher813 cipher813 deleted the fix/daily-closes-source-priority-coalesce-merge branch June 1, 2026 13:23
cipher813 added a commit that referenced this pull request Jun 2, 2026
…ckoff/weekday-pull/restatement-severity (L4482/L4480/L4483/L4486) (#363)

Follow-ups to the 2026-06-01 FRED-429 / polygon-timeout incident (#354
made the system resilient to a missed value; these close the surrounding
gaps the incident exposed).

L4482 — macro backstop survives a transient polygon-fetch exception.
collect() Step 1 now catches the TRANSIENT network class (requests
Timeout/ConnectionError + PolygonRateLimitError) in polygon_only mode,
logs loudly, and falls through to FRED (Step 2) + the macro yfinance
backstop (Step 3) — the exact gap that failed recovery re-run #1 (a
polygon read-timeout aborted collect() before the FRED-index macro keys
could fill). Narrow by design: PolygonForbiddenError (403) and the
deliberate "0 tickers" empty-data RuntimeError still propagate with their
own messages; a real equity outage still hard-fails at the coverage gate
(0 equity records -> < 95%), so the catch cannot mask equity data loss.

L4480 — FRED backoff + jitter. New _fred_get_with_retry: bounded
exponential backoff + full jitter on the transient class (429 / 5xx /
timeout / connection), honors a server Retry-After when present, and
raises immediately on a deterministic 4xx (no point retrying a bad
series_id). Stops the 429 storm at the source rather than only tolerating
a missed value. Mirrors the in-repo polygon_client retry idiom; no new dep.

L4483 — weekday SF MorningEnrich now git-pulls. step_function_daily.json
MorningEnrich adds the same `git -C ... pull --ff-only origin main` for
alpha-engine-data + alpha-engine-config that the Saturday SF already runs,
so a same-day recovery re-run on a still-running instance no longer
executes stale code (cost a manual SSM pull deploying #354 this incident).

L4486 — discrepancy ERROR severity scoped. _log_close_discrepancies now
emits FRED-index restatements toward the authoritative value at WARN
(`fred_restatement`, excluded from the flow-doctor ERROR filter) while
keeping ERROR for genuine cross-source EQUITY drift. The reconciliation
predictably restates VIX/TNX >5% when healing a 429-clobbered or stale
T-1 edge cell (5/14 VIX, 6/2) — desirable self-heals, not anomalies. The
recording surface stays (per feedback_no_silent_fails) at the right level.

Tests: new tests/test_daily_closes_coalesce_hardening.py (8) for the retry
class, transient-non-fatal control flow + 403-still-propagates, and
restatement-vs-equity severity. Fixed two affected fixtures to set
status_code (the helper now inspects it). Full suite: 1785 passed.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant