fix(daily_closes): coalesce-arc hardening — polygon-transient / FRED-backoff / weekday-pull / restatement-severity (L4482/L4480/L4483/L4486)#363
Merged
Conversation
…ckoff/weekday-pull/restatement-severity (L4482/L4480/L4483/L4486) Follow-ups to the 2026-06-01 FRED-429 / polygon-timeout incident (#354 made the system resilient to a missed value; these close the surrounding gaps the incident exposed). L4482 — macro backstop survives a transient polygon-fetch exception. collect() Step 1 now catches the TRANSIENT network class (requests Timeout/ConnectionError + PolygonRateLimitError) in polygon_only mode, logs loudly, and falls through to FRED (Step 2) + the macro yfinance backstop (Step 3) — the exact gap that failed recovery re-run #1 (a polygon read-timeout aborted collect() before the FRED-index macro keys could fill). Narrow by design: PolygonForbiddenError (403) and the deliberate "0 tickers" empty-data RuntimeError still propagate with their own messages; a real equity outage still hard-fails at the coverage gate (0 equity records -> < 95%), so the catch cannot mask equity data loss. L4480 — FRED backoff + jitter. New _fred_get_with_retry: bounded exponential backoff + full jitter on the transient class (429 / 5xx / timeout / connection), honors a server Retry-After when present, and raises immediately on a deterministic 4xx (no point retrying a bad series_id). Stops the 429 storm at the source rather than only tolerating a missed value. Mirrors the in-repo polygon_client retry idiom; no new dep. L4483 — weekday SF MorningEnrich now git-pulls. step_function_daily.json MorningEnrich adds the same `git -C ... pull --ff-only origin main` for alpha-engine-data + alpha-engine-config that the Saturday SF already runs, so a same-day recovery re-run on a still-running instance no longer executes stale code (cost a manual SSM pull deploying #354 this incident). L4486 — discrepancy ERROR severity scoped. _log_close_discrepancies now emits FRED-index restatements toward the authoritative value at WARN (`fred_restatement`, excluded from the flow-doctor ERROR filter) while keeping ERROR for genuine cross-source EQUITY drift. The reconciliation predictably restates VIX/TNX >5% when healing a 429-clobbered or stale T-1 edge cell (5/14 VIX, 6/2) — desirable self-heals, not anomalies. The recording surface stays (per feedback_no_silent_fails) at the right level. Tests: new tests/test_daily_closes_coalesce_hardening.py (8) for the retry class, transient-non-fatal control flow + 403-still-propagates, and restatement-vs-equity severity. Fixed two affected fixtures to set status_code (the helper now inspects it). Full suite: 1785 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Four follow-ups to the 2026-06-01 FRED-429 / polygon-timeout incident. #354 made the system resilient to a missed value; these close the surrounding gaps the incident exposed. All in
collectors/daily_closes.py+ the weekday Step Function.L4482 (P1) — macro backstop survives a transient polygon-fetch exception
collect()Step 1 now catches the transient network class (requests.Timeout/ConnectionError/PolygonRateLimitError) inpolygon_onlymode, logs loudly, and falls through to FRED (Step 2) + the macro yfinance backstop (Step 3). This is the exact gap that failed recovery re-run #1 — a polygon read-timeout abortedcollect()before the FRED-index macro keys (^TNX/^VIX/^IRX/^VIX3M, which never come from polygon) could fill.Narrow by design:
PolygonForbiddenError(403) and the deliberate"0 tickers"empty-dataRuntimeErrorstill propagate with their own clear messages; a real equity outage still hard-fails at the coverage gate (0 equity records → <95%), so the catch cannot mask equity data loss.L4480 (P1) — FRED backoff + jitter
New
_fred_get_with_retry: bounded exponential backoff + full jitter on the transient class (429 / 5xx / timeout / connection), honors a serverRetry-Afterwhen present, raises immediately on a deterministic 4xx (no point retrying a badseries_id). Stops the 429 storm at the source rather than only tolerating a missed value. Mirrors the in-repopolygon_clientretry idiom; no new dependency.L4483 (P2) — weekday SF MorningEnrich now git-pulls
infrastructure/step_function_daily.jsonMorningEnrich adds the samegit -C … pull --ff-only origin mainforalpha-engine-data+alpha-engine-configthat the Saturday SF already runs. A same-day recovery re-run on a still-running instance no longer executes stale code (this incident cost a manual SSMgit pullto deploy #354).L4486 (P2) — discrepancy ERROR severity scoped
_log_close_discrepanciesnow emits FRED-index restatements toward the authoritative value at WARN (fred_restatement, excluded from the flow-doctor ERROR filter) while keeping ERROR for genuine cross-source equity drift. The reconciliation predictably restates VIX/TNX >5% when healing a 429-clobbered cell (5/14 VIX) or a stale T-1 edge cell (6/2) — desirable self-heals, not anomalies. Recording surface stays (perfeedback_no_silent_fails) at the right level. Pattern observed twice (5/12, 6/2).Tests
tests/test_daily_closes_coalesce_hardening.py(8): retry class (429→success, persistent-429 raises, 4xx not retried, timeout reraised), transient-non-fatal control flow + 403-still-propagates, restatement-vs-equity severity.status_code(the helper now inspects it beforeraise_for_status).Deploy note
L4483 changes the weekday Step Function — needs the SF definition re-applied (CFN/Terraform/
aws stepfunctions update-state-machine) for the git-pull to take effect; the daily_closes code changes deploy via the new MorningEnrich pull (or boot-pull). No S3 schema/path changes.🤖 Generated with Claude Code