Skip to content

fix(daily_closes): coalesce-arc hardening — polygon-transient / FRED-backoff / weekday-pull / restatement-severity (L4482/L4480/L4483/L4486)#363

Merged
cipher813 merged 1 commit into
mainfrom
fix/coalesce-arc-hardening
Jun 2, 2026
Merged

fix(daily_closes): coalesce-arc hardening — polygon-transient / FRED-backoff / weekday-pull / restatement-severity (L4482/L4480/L4483/L4486)#363
cipher813 merged 1 commit into
mainfrom
fix/coalesce-arc-hardening

Conversation

@cipher813
Copy link
Copy Markdown
Owner

What

Four follow-ups to the 2026-06-01 FRED-429 / polygon-timeout incident. #354 made the system resilient to a missed value; these close the surrounding gaps the incident exposed. All in collectors/daily_closes.py + the weekday Step Function.

L4482 (P1) — macro backstop survives a transient polygon-fetch exception

collect() Step 1 now catches the transient network class (requests.Timeout / ConnectionError / PolygonRateLimitError) in polygon_only mode, logs loudly, and falls through to FRED (Step 2) + the macro yfinance backstop (Step 3). This is the exact gap that failed recovery re-run #1 — a polygon read-timeout aborted collect() before the FRED-index macro keys (^TNX/^VIX/^IRX/^VIX3M, which never come from polygon) could fill.
Narrow by design: PolygonForbiddenError (403) and the deliberate "0 tickers" empty-data RuntimeError still propagate with their own clear messages; a real equity outage still hard-fails at the coverage gate (0 equity records → <95%), so the catch cannot mask equity data loss.

L4480 (P1) — FRED backoff + jitter

New _fred_get_with_retry: bounded exponential backoff + full jitter on the transient class (429 / 5xx / timeout / connection), honors a server Retry-After when present, raises immediately on a deterministic 4xx (no point retrying a bad series_id). Stops the 429 storm at the source rather than only tolerating a missed value. Mirrors the in-repo polygon_client retry idiom; no new dependency.

L4483 (P2) — weekday SF MorningEnrich now git-pulls

infrastructure/step_function_daily.json MorningEnrich adds the same git -C … pull --ff-only origin main for alpha-engine-data + alpha-engine-config that the Saturday SF already runs. A same-day recovery re-run on a still-running instance no longer executes stale code (this incident cost a manual SSM git pull to deploy #354).

L4486 (P2) — discrepancy ERROR severity scoped

_log_close_discrepancies now emits FRED-index restatements toward the authoritative value at WARN (fred_restatement, excluded from the flow-doctor ERROR filter) while keeping ERROR for genuine cross-source equity drift. The reconciliation predictably restates VIX/TNX >5% when healing a 429-clobbered cell (5/14 VIX) or a stale T-1 edge cell (6/2) — desirable self-heals, not anomalies. Recording surface stays (per feedback_no_silent_fails) at the right level. Pattern observed twice (5/12, 6/2).

Tests

  • New tests/test_daily_closes_coalesce_hardening.py (8): retry class (429→success, persistent-429 raises, 4xx not retried, timeout reraised), transient-non-fatal control flow + 403-still-propagates, restatement-vs-equity severity.
  • Fixed two affected fixtures to set status_code (the helper now inspects it before raise_for_status).
  • Full suite: 1785 passed, 1 skipped.

Deploy note

L4483 changes the weekday Step Function — needs the SF definition re-applied (CFN/Terraform/aws stepfunctions update-state-machine) for the git-pull to take effect; the daily_closes code changes deploy via the new MorningEnrich pull (or boot-pull). No S3 schema/path changes.

🤖 Generated with Claude Code

…ckoff/weekday-pull/restatement-severity (L4482/L4480/L4483/L4486)

Follow-ups to the 2026-06-01 FRED-429 / polygon-timeout incident (#354
made the system resilient to a missed value; these close the surrounding
gaps the incident exposed).

L4482 — macro backstop survives a transient polygon-fetch exception.
collect() Step 1 now catches the TRANSIENT network class (requests
Timeout/ConnectionError + PolygonRateLimitError) in polygon_only mode,
logs loudly, and falls through to FRED (Step 2) + the macro yfinance
backstop (Step 3) — the exact gap that failed recovery re-run #1 (a
polygon read-timeout aborted collect() before the FRED-index macro keys
could fill). Narrow by design: PolygonForbiddenError (403) and the
deliberate "0 tickers" empty-data RuntimeError still propagate with their
own messages; a real equity outage still hard-fails at the coverage gate
(0 equity records -> < 95%), so the catch cannot mask equity data loss.

L4480 — FRED backoff + jitter. New _fred_get_with_retry: bounded
exponential backoff + full jitter on the transient class (429 / 5xx /
timeout / connection), honors a server Retry-After when present, and
raises immediately on a deterministic 4xx (no point retrying a bad
series_id). Stops the 429 storm at the source rather than only tolerating
a missed value. Mirrors the in-repo polygon_client retry idiom; no new dep.

L4483 — weekday SF MorningEnrich now git-pulls. step_function_daily.json
MorningEnrich adds the same `git -C ... pull --ff-only origin main` for
alpha-engine-data + alpha-engine-config that the Saturday SF already runs,
so a same-day recovery re-run on a still-running instance no longer
executes stale code (cost a manual SSM pull deploying #354 this incident).

L4486 — discrepancy ERROR severity scoped. _log_close_discrepancies now
emits FRED-index restatements toward the authoritative value at WARN
(`fred_restatement`, excluded from the flow-doctor ERROR filter) while
keeping ERROR for genuine cross-source EQUITY drift. The reconciliation
predictably restates VIX/TNX >5% when healing a 429-clobbered or stale
T-1 edge cell (5/14 VIX, 6/2) — desirable self-heals, not anomalies. The
recording surface stays (per feedback_no_silent_fails) at the right level.

Tests: new tests/test_daily_closes_coalesce_hardening.py (8) for the retry
class, transient-non-fatal control flow + 403-still-propagates, and
restatement-vs-equity severity. Fixed two affected fixtures to set
status_code (the helper now inspects it). Full suite: 1785 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit e0454ec into main Jun 2, 2026
3 checks passed
@cipher813 cipher813 deleted the fix/coalesce-arc-hardening branch June 2, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant