Skip to content

feat(artifact_freshness): v0.40.0 — absence-driven S3 artifact monitoring substrate#83

Merged
cipher813 merged 1 commit into
mainfrom
feat/artifact-freshness-substrate
May 27, 2026
Merged

feat(artifact_freshness): v0.40.0 — absence-driven S3 artifact monitoring substrate#83
cipher813 merged 1 commit into
mainfrom
feat/artifact-freshness-substrate

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Lib-side piece of the artifact-freshness-monitor arc (plan doc at ~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md). Closes the silent absence-of-artifact bug class — the 2026-05-17→27 pit_parity.json incident (load-bearing artifact silently absent for 11 days), the 2026-05-18 factor-profiles orphan, and the 2026-05-23 missing-signals.json incident are the sibling triggers. SF Catch / flow-doctor / substrate-health-check are all event-driven (failure → alert); this module is the substrate for the absence-driven complement (silence → alert).

PR 1 of ~7 in the arc. Phase 2-6 ship in follow-up PRs across alpha-engine-config (registry SoT), alpha-engine-data (Lambda + CI guard), alpha-engine-{research,predictor,backtester} (CI guards), and alpha-engine-dashboard (operator surface).

Surface

  • ArtifactSpec — registry-row dataclass with cadence symbol (saturday_sf / weekday_sf / eod_sf / continuous), SLA, severity, grace period, optional recovery key template. __post_init__ is the producer-side chokepoint — bad rows fail at registry-load time.
  • CheckResult — per-probe outcome (fresh / stale / missing / probe_failed / grace_period + last_modified + sla_violated_by_minutes).
  • check_freshness(s3_client, spec, now) — pure ((s3, spec, now) → CheckResult) with the five-branch routing (grace → calendar-holiday → HEAD canonical → stale check → recovery substitution). No side effects beyond the injected head_object call. The Lambda (Phase 3) is what consumes the result and routes to alpha_engine_lib.alerts.publish.
  • resolve_dedup_key(spec, now) — stable per-cycle key for the alerts.publish dedup chokepoint. Same label across all probes within a cycle collapses 4×/hour retries to one alert per cycle per artifact.
  • resolve_current_cycle(spec, now) — pure helper exposed for testability.

Design invariants

  1. Pure. No S3 calls except via the injected s3_client. No datetime.now() — caller passes now for testability.
  2. Calendar-aware. weekday_sf / eod_sf cycles short-circuit to state="fresh" on NYSE holidays (the cron didn't fire, so absence is correct). The holiday gate is in check_freshness, not resolve_current_cycle — so dedup keys stay distinct per calendar day.
  3. Recovery-aware. When spec.recovery_key_template is set, a 404/stale on the canonical key falls through to a HEAD on the recovery key. Either fresh ⇒ overall fresh. Implements the recovery-SF-substitution semantic from plan §3 invariant 3.
  4. Probe-failure separated. 403 / InvalidBucketName / EndpointConnectionErrorprobe_failed (the monitor itself is broken; severity routes to operator). 404missing (the canonical missing-artifact path, severity per spec).
  5. Grace-period. Specs younger than grace_period_cycles cycles suppress to state="grace_period" — newly-onboarded producers don't false-alarm on their first emissions.

Tests

37 unit tests covering:

  • ArtifactSpec validation (cadence / severity / SLA / grace / continuous-requires-interval).
  • resolve_current_cycle per cadence symbol — Saturday before/after cron, weekday before/after cron, Monday-morning step-back to Friday, NYSE-holiday non-snap, EOD anchor at 21:00 UTC, continuous bucketing, naive datetime coercion.
  • resolve_dedup_key — shape, same-cycle stability, different-cycle uniqueness.
  • check_freshness — grace period gate (active + zero), NYSE-holiday short-circuit (calendar_aware=True/False), canonical HEAD fresh / missing / stale / 403 / network error, recovery substitution (canonical missing+recovery fresh / canonical stale+recovery fresh / recovery too old / probe_failed bypasses recovery).

Suite delta: 851 → 888 passing, all warnings pre-existing (deprecated BasePreflight.check_arcticdb_universe_fresh).

Test plan

  • pytest tests/test_artifact_freshness.py — 37 passing
  • pytest full suite — 888 passing
  • pytest tests/test_version_pin.py tests/test_version_bump_workflow.py — 7 passing (lib v0.40.0 description under PyPI 512-char cap; __init__.__version__ matches pyproject.toml::version)
  • Post-merge: auto-tag workflow cuts v0.40.0, consumers pin via git+https://github.com/cipher813/alpha-engine-lib@v0.40.0 in Phase 3 PR

Composes with

  • alpha_engine_lib.alerts.publish — the dedup chokepoint the Phase 3 Lambda calls with dedup_key=resolve_dedup_key(spec, now).
  • alpha_engine_lib.trading_calendar — NYSE-holiday substrate (is_trading_day).
  • [[feedback_no_silent_fails]] — the rule this proposal operationalizes for the absence-of-artifact case.
  • [[feedback_sota_institutional_default_no_shortcuts]] — registry-as-config + chokepoint substrate is the SOTA pattern over ad-hoc per-producer "ping me if I'm late" hooks.
  • [[feedback_lift_invariants_to_chokepoint_after_second_recurrence]] — this IS the lift for the absence-detection chokepoint; Phase 4 PRs cascade the CI-guard mirror.

🤖 Generated with Claude Code

…ring substrate

Lib-side piece of the artifact-freshness-monitor arc (plan doc at
~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md).
Closes the silent absence-of-artifact bug class — the 2026-05-17→27
pit_parity.json incident (load-bearing artifact silently absent for 11
days), the 2026-05-18 factor-profiles orphan, and the 2026-05-23
missing-signals.json incident are the sibling triggers. SF Catch /
flow-doctor / substrate-health-check are all event-driven; this is the
absence-driven complement.

Surface:

- ArtifactSpec — registry-row dataclass with cadence symbol
  (saturday_sf / weekday_sf / eod_sf / continuous), SLA, severity,
  grace period, optional recovery key template. __post_init__ is the
  producer-side chokepoint — bad rows fail at registry-load time.
- CheckResult — per-probe outcome (fresh / stale / missing /
  probe_failed / grace_period + last_modified + sla_violated_by_minutes).
- check_freshness(s3_client, spec, now) — pure ((s3, spec, now) → result)
  with the five-branch routing (grace → calendar-holiday → HEAD
  canonical → stale check → recovery substitution). No side effects
  beyond the injected head_object call. The Lambda (Phase 3, PR 3) is
  what consumes the result and routes to alpha_engine_lib.alerts.publish.
- resolve_dedup_key(spec, now) — stable per-cycle key for the
  alerts.publish dedup chokepoint. Same label across all probes within
  a cycle collapses 4×/hour retries to one alert per cycle per artifact.
- resolve_current_cycle(spec, now) — pure helper exposed for testability.

37 unit tests covering spec validation, cycle resolution per cadence,
dedup-key stability, grace-period gate, NYSE-holiday short-circuit,
canonical HEAD fresh/missing/stale/probe_failed paths, recovery
substitution (canonical missing/stale + recovery fresh ⇒ fresh).
Suite: 851 → 888 passing.

Composes with alpha_engine_lib.alerts.publish (dedup chokepoint) +
alpha_engine_lib.trading_calendar (NYSE-holiday substrate). Phase 2-6
ship in follow-up PRs across alpha-engine-config / alpha-engine-data /
alpha-engine-{research,predictor,backtester} / alpha-engine-dashboard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant