feat(freshness-monitor): wire cycle_completion rollup into the monitor (L249)#347
Merged
Merged
Conversation
…r (L249) Closes the loop on alpha-engine-lib #88 — cycle_completion() existed but nothing consumed it. The freshness-monitor Lambda already builds the (ArtifactSpec, CheckResult) pairs the rollup needs, so it's the natural home. - _serialize_cycle_verdicts(): groups the probe results by (cadence, cycle_label) and rolls each group up via cycle_completion. The registry walk covers EVERY cadence in one 15-min pass, so per-cadence grouping is MANDATORY — a single rollup over the mixed-cadence pairs would conflate the Saturday/weekday/EOD/ continuous cycles into one meaningless verdict. weekday_sf and eod_sf share a date-shaped label, so cadence is part of the group key. - Writes _freshness_monitor/cycle_verdict.json (one verdict per cadence-cycle, with missing/stale/probe_failed/grace_period localization lists). - Emits CloudWatch ArtifactFreshnessCycleComplete (1.0/0.0) per cadence in the AlphaEngine/Substrate namespace. Dimensioned by Cadence ONLY — a stable, low-cardinality set; the per-cycle label is in the artifact, NOT a metric dimension (a label would be high-cardinality, unalarmable, costly). - iam-policy.json: add cloudwatch:PutMetricData scoped to the AlphaEngine/Substrate namespace via condition (applied by deploy.sh's put-role-policy). The monitor emitted no CW metrics before this. The entire rollup block is wrapped (try/except WARN): it is SECONDARY observability hung off the primary probe pass, ordered AFTER the primary check_results + heartbeat writes, so a CW-throttle / IAM-not-yet-applied / cycle-resolution failure can never sink the monitor's primary deliverables. Recording surface = WARN log + staleness of cycle_verdict.json. Per CLAUDE.md no-silent-fails secondary-observability carve-out. 4 new tests (per-cadence verdict; complete-when-fresh; CW metric shape; non-fatal rollup failure); suite 27 → 31. Observational only — no new alert (the monitor is still in OBSERVE soak). ROADMAP L249 consumer follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Wires the
cycle_completion()rollup (shipped dormant in alpha-engine-lib #88, v0.44.0) into the freshness-monitor Lambda — the consumer follow-up filed against ROADMAP L249. The rollup existed but nothing called it; the monitor already builds the(ArtifactSpec, CheckResult)pairs it needs, so this is the natural home.Why the freshness-monitor (not transparency)
The two substrate consumers are independent:
transparencyvalidatestransparency_inventory.yamlrows; the freshness-monitor probesARTIFACT_REGISTRY.yamlartifacts viacheck_freshness.cycle_completionaggregatescheck_freshnessresults overseverity:criticalrows — so it belongs here, where those results already exist.Changes
_serialize_cycle_verdicts()— groups the probe results by(cadence, cycle_label)and rolls each group up. The registry walk covers every cadence in one 15-min pass, so per-cadence grouping is mandatory — a single rollup over the mixed-cadence pairs would conflate the Saturday / weekday / EOD / continuous cycles into one meaningless verdict. (weekday_sfandeod_sfshare a date-shaped label, socadenceis part of the group key.)_freshness_monitor/cycle_verdict.json— one verdict per cadence-cycle, withmissing/stale/probe_failed/grace_periodlocalization lists.ArtifactFreshnessCycleComplete(1.0/0.0) per cadence inAlphaEngine/Substrate. Dimensioned byCadenceonly — a stable, low-cardinality set ({saturday_sf, weekday_sf, eod_sf, continuous}) a CW alarm can bind to. The per-cyclecycle_labellives in the S3 artifact, not a metric dimension (a label is high-cardinality → unalarmable + costly).iam-policy.json— addscloudwatch:PutMetricDatascoped to theAlphaEngine/Substratenamespace via condition. The monitor emitted no CW metrics before;deploy.sh:148(aws iam put-role-policy) applies the grant on next deploy.Fail-loud posture
The whole rollup block is wrapped (
try/except→ WARN), ordered after the primarycheck_results+heartbeatwrites. It is secondary observability hung off the primary probe pass — a CW throttle, the IAM grant not-yet-applied, or a cycle-resolution edge can never sink the monitor's primary deliverables. Recording surface = WARN log + staleness ofcycle_verdict.json. Per the CLAUDE.md no-silent-fails secondary-observability carve-out (inline comment names the swallowed mode + recording surface).This is observational only — no new alert fires (the monitor is still in OBSERVE soak,
MNEMON_FRESHNESS_MONITOR_ENABLED). Alerting on cycle incompleteness is a future step gated on the monitor exiting OBSERVE.Verification
cycle_verdicts={}).Deploy note
Needs a freshness-monitor Lambda redeploy (
deploy.sh) for the IAM grant to take effect — until then the CW emit WARN-logs andcycle_verdict.json(S3, already permitted) still lands, so the degradation is graceful.🤖 Generated with Claude Code