feat(artifact_freshness): v0.40.0 — absence-driven S3 artifact monitoring substrate#83
Merged
Merged
Conversation
…ring substrate
Lib-side piece of the artifact-freshness-monitor arc (plan doc at
~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md).
Closes the silent absence-of-artifact bug class — the 2026-05-17→27
pit_parity.json incident (load-bearing artifact silently absent for 11
days), the 2026-05-18 factor-profiles orphan, and the 2026-05-23
missing-signals.json incident are the sibling triggers. SF Catch /
flow-doctor / substrate-health-check are all event-driven; this is the
absence-driven complement.
Surface:
- ArtifactSpec — registry-row dataclass with cadence symbol
(saturday_sf / weekday_sf / eod_sf / continuous), SLA, severity,
grace period, optional recovery key template. __post_init__ is the
producer-side chokepoint — bad rows fail at registry-load time.
- CheckResult — per-probe outcome (fresh / stale / missing /
probe_failed / grace_period + last_modified + sla_violated_by_minutes).
- check_freshness(s3_client, spec, now) — pure ((s3, spec, now) → result)
with the five-branch routing (grace → calendar-holiday → HEAD
canonical → stale check → recovery substitution). No side effects
beyond the injected head_object call. The Lambda (Phase 3, PR 3) is
what consumes the result and routes to alpha_engine_lib.alerts.publish.
- resolve_dedup_key(spec, now) — stable per-cycle key for the
alerts.publish dedup chokepoint. Same label across all probes within
a cycle collapses 4×/hour retries to one alert per cycle per artifact.
- resolve_current_cycle(spec, now) — pure helper exposed for testability.
37 unit tests covering spec validation, cycle resolution per cadence,
dedup-key stability, grace-period gate, NYSE-holiday short-circuit,
canonical HEAD fresh/missing/stale/probe_failed paths, recovery
substitution (canonical missing/stale + recovery fresh ⇒ fresh).
Suite: 851 → 888 passing.
Composes with alpha_engine_lib.alerts.publish (dedup chokepoint) +
alpha_engine_lib.trading_calendar (NYSE-holiday substrate). Phase 2-6
ship in follow-up PRs across alpha-engine-config / alpha-engine-data /
alpha-engine-{research,predictor,backtester} / alpha-engine-dashboard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lib-side piece of the artifact-freshness-monitor arc (plan doc at
~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md). Closes the silent absence-of-artifact bug class — the 2026-05-17→27pit_parity.jsonincident (load-bearing artifact silently absent for 11 days), the 2026-05-18 factor-profiles orphan, and the 2026-05-23 missing-signals.jsonincident are the sibling triggers. SF Catch / flow-doctor / substrate-health-check are all event-driven (failure → alert); this module is the substrate for the absence-driven complement (silence → alert).PR 1 of ~7 in the arc. Phase 2-6 ship in follow-up PRs across
alpha-engine-config(registry SoT),alpha-engine-data(Lambda + CI guard),alpha-engine-{research,predictor,backtester}(CI guards), andalpha-engine-dashboard(operator surface).Surface
ArtifactSpec— registry-row dataclass with cadence symbol (saturday_sf/weekday_sf/eod_sf/continuous), SLA, severity, grace period, optional recovery key template.__post_init__is the producer-side chokepoint — bad rows fail at registry-load time.CheckResult— per-probe outcome (fresh/stale/missing/probe_failed/grace_period+last_modified+sla_violated_by_minutes).check_freshness(s3_client, spec, now)— pure((s3, spec, now) → CheckResult)with the five-branch routing (grace → calendar-holiday → HEAD canonical → stale check → recovery substitution). No side effects beyond the injectedhead_objectcall. The Lambda (Phase 3) is what consumes the result and routes toalpha_engine_lib.alerts.publish.resolve_dedup_key(spec, now)— stable per-cycle key for thealerts.publishdedup chokepoint. Same label across all probes within a cycle collapses 4×/hour retries to one alert per cycle per artifact.resolve_current_cycle(spec, now)— pure helper exposed for testability.Design invariants
s3_client. Nodatetime.now()— caller passesnowfor testability.weekday_sf/eod_sfcycles short-circuit tostate="fresh"on NYSE holidays (the cron didn't fire, so absence is correct). The holiday gate is incheck_freshness, notresolve_current_cycle— so dedup keys stay distinct per calendar day.spec.recovery_key_templateis set, a 404/stale on the canonical key falls through to a HEAD on the recovery key. Either fresh ⇒ overall fresh. Implements the recovery-SF-substitution semantic from plan §3 invariant 3.403/InvalidBucketName/EndpointConnectionError⇒probe_failed(the monitor itself is broken; severity routes to operator).404⇒missing(the canonical missing-artifact path, severity per spec).grace_period_cyclescycles suppress tostate="grace_period"— newly-onboarded producers don't false-alarm on their first emissions.Tests
37 unit tests covering:
ArtifactSpecvalidation (cadence / severity / SLA / grace / continuous-requires-interval).resolve_current_cycleper cadence symbol — Saturday before/after cron, weekday before/after cron, Monday-morning step-back to Friday, NYSE-holiday non-snap, EOD anchor at 21:00 UTC, continuous bucketing, naive datetime coercion.resolve_dedup_key— shape, same-cycle stability, different-cycle uniqueness.check_freshness— grace period gate (active + zero), NYSE-holiday short-circuit (calendar_aware=True/False), canonical HEAD fresh / missing / stale / 403 / network error, recovery substitution (canonical missing+recovery fresh / canonical stale+recovery fresh / recovery too old / probe_failed bypasses recovery).Suite delta: 851 → 888 passing, all warnings pre-existing (deprecated
BasePreflight.check_arcticdb_universe_fresh).Test plan
pytest tests/test_artifact_freshness.py— 37 passingpytestfull suite — 888 passingpytest tests/test_version_pin.py tests/test_version_bump_workflow.py— 7 passing (lib v0.40.0 description under PyPI 512-char cap;__init__.__version__matchespyproject.toml::version)v0.40.0, consumers pin viagit+https://github.com/cipher813/alpha-engine-lib@v0.40.0in Phase 3 PRComposes with
alpha_engine_lib.alerts.publish— the dedup chokepoint the Phase 3 Lambda calls withdedup_key=resolve_dedup_key(spec, now).alpha_engine_lib.trading_calendar— NYSE-holiday substrate (is_trading_day).[[feedback_no_silent_fails]]— the rule this proposal operationalizes for the absence-of-artifact case.[[feedback_sota_institutional_default_no_shortcuts]]— registry-as-config + chokepoint substrate is the SOTA pattern over ad-hoc per-producer "ping me if I'm late" hooks.[[feedback_lift_invariants_to_chokepoint_after_second_recurrence]]— this IS the lift for the absence-detection chokepoint; Phase 4 PRs cascade the CI-guard mirror.🤖 Generated with Claude Code