Skip to content

feat(freshness): Phase 5 PR 8 — Artifact Freshness console page + System Health KPI strip#134

Merged
cipher813 merged 1 commit into
mainfrom
feat/artifact-freshness-page
May 27, 2026
Merged

feat(freshness): Phase 5 PR 8 — Artifact Freshness console page + System Health KPI strip#134
cipher813 merged 1 commit into
mainfrom
feat/artifact-freshness-page

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Phase 5 PR 8 (final code PR in the arc; Phase 6 is operator-driven soak + env-var cutover) of the artifact-freshness-monitor arc (plan doc at ~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md). Closes the operator-surface dimension — Phase 3's freshness-monitor Lambda writes the artifacts; this PR is the console consumer.

Changes

  • pages/26_Artifact_Freshness.py (new) — dedicated console page. Reads:

    • s3://alpha-engine-research/_freshness_monitor/heartbeat.json (Lambda self-heartbeat: last run, aggregate counts, alerts_enabled)
    • s3://alpha-engine-research/_freshness_monitor/check_results.json (per-spec rows: state, last-modified, SLA-breach minutes, reason)

    Surface:

    • Top KPI strip — last-run age + per-state counts (fresh / grace / stale / missing / probe_failed) + mode badge (OBSERVE-only vs alerts-live)
    • OBSERVE-mode banner when alerts_enabled=false, with the cutover command spelled out
    • Filters by owner_repo / cadence / severity / state
    • Sortable table with color-coded state badges (probe_failed > missing > stale > grace > fresh; severity-bumped within state)
    • Operator runbook in an expander — common causes for probe_failed / missing, plus the force-invoke recipe
  • pages/4_System_Health.py — new "Artifact Freshness Monitor" section at top of page. Same heartbeat-derived KPI strip + a link to /Artifact_Freshness for drill-down. Gracefully no-ops when heartbeat is absent (Lambda hasn't deployed yet, or registry is empty).

Arc-wide status (post-merge of this PR)

PR Repo Title Status
1 alpha-engine-lib #83artifact_freshness substrate v0.40.0 ✅ merged
2 alpha-engine-config #344 — ARTIFACT_REGISTRY.yaml + PR-time validator ✅ merged
3 alpha-engine-data #335 — freshness-monitor Lambda + EB cron ✅ merged
4 alpha-engine-data #336 — producer-side CI guard ✅ merged
5 alpha-engine-research #243 — producer-side CI guard open
6 alpha-engine-predictor #204 — producer-side CI guard open
7 alpha-engine-backtester #256 — producer-side CI guard open
8 alpha-engine-dashboard this PR — operator console surface open

Phase 6 cutover (operator-driven, no PR): ≥2 weekly cycles in OBSERVE mode (earliest cutover ~2026-06-13; more realistically ~2026-06-20). Cutover via:

aws lambda update-function-configuration \
  --function-name alpha-engine-freshness-monitor \
  --environment 'Variables={LOG_LEVEL=INFO,MNEMON_FRESHNESS_MONITOR_ENABLED=true}'

Mirrors the mnemon 0.7.0rc4 pattern from 2026-05-24 — env-var flip without redeploy.

Test plan

  • python3 -c "import ast; ast.parse(open('pages/26_Artifact_Freshness.py').read())" — syntax OK
  • python3 -c "import ast; ast.parse(open('pages/4_System_Health.py').read())" — syntax OK
  • Post-merge + post-Lambda-deploy: visit /Artifact_Freshness after first cron firing; verify KPI strip + table render with real data
  • Post-merge + post-Lambda-deploy: visit /System_Health; verify new Artifact Freshness Monitor section appears at top
  • Verify graceful no-op when heartbeat is absent (e.g., on a fresh dashboard deploy before the Lambda has run)

Deploy

Manual code-only deploy after merge:

ae-dashboard "sudo systemctl start boot-pull && sudo systemctl restart dashboard && sudo systemctl restart nous-ergon-public"

Per CLAUDE.md ## Dashboard section: boot-pull.sh only auto-restarts services whose .service unit file changed, so code-only PRs need explicit restarts on both dashboard (console) and nous-ergon-public services.

🤖 Generated with Claude Code

…tem Health KPI strip

Phase 5 PR 8 (final PR in the arc, modulo Phase 6 soak + cutover) of
the artifact-freshness-monitor arc (plan doc:
~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md).
Closes the operator-surface dimension of the arc — Phase 3's
freshness-monitor Lambda writes the artifacts; this PR is the
consumer.

Changes:

- pages/26_Artifact_Freshness.py (new) — dedicated console page for
  per-artifact red/yellow/green at a glance. Reads:
  * s3://alpha-engine-research/_freshness_monitor/heartbeat.json
    (Lambda self-heartbeat — last run, aggregate counts, alerts_enabled)
  * s3://alpha-engine-research/_freshness_monitor/check_results.json
    (per-spec rows: state, last-modified, SLA-breach minutes, reason)

  Surface:
  * Top KPI strip — last-run age + per-state counts + mode
    (OBSERVE-only vs alerts-live).
  * OBSERVE-mode banner when alerts_enabled=false, with the
    cutover command spelled out.
  * Filters by owner_repo / cadence / severity / state.
  * Sortable table with color-coded state badges
    (probe_failed > missing > stale > grace > fresh; severity-bumped
    within state).
  * Operator runbook in an expander — common causes for probe_failed
    / missing, plus the force-invoke recipe.

- pages/4_System_Health.py — new "Artifact Freshness Monitor" section
  at top of page (right under page caption). Same heartbeat-derived
  KPI strip + a link to the dedicated /Artifact_Freshness page for
  drill-down. Gracefully no-ops when heartbeat is absent (Lambda
  hasn't deployed yet, or live registry is empty).

Companion to:
- alpha-engine-lib #83 (merged) — substrate (ArtifactSpec, check_freshness,
  resolve_dedup_key) at v0.40.0
- alpha-engine-config #344 (merged) — registry SoT (48 entries, 27
  grandfathered prefixes) + PR-time validator
- alpha-engine-data #335 (merged) — freshness-monitor Lambda + EB cron
- alpha-engine-data #336 (merged) — producer-side CI guard
- alpha-engine-research #243, alpha-engine-predictor #204,
  alpha-engine-backtester #256 (open) — producer-side CI guards
  (Phase 4 cascade complete)

Phase 6 cutover (operator-driven, no PR): ≥2 weekly cycles in OBSERVE
mode → env-var flip via
`aws lambda update-function-configuration --environment
'Variables={MNEMON_FRESHNESS_MONITOR_ENABLED=true,...}'` —
mirrors the mnemon 0.7.0rc4 pattern from 2026-05-24.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying nousergon-marketing with  Cloudflare Pages  Cloudflare Pages

Latest commit: f63cd4b
Status: ✅  Deploy successful!
Preview URL: https://29d1fe8c.nousergon-marketing.pages.dev
Branch Preview URL: https://feat-artifact-freshness-page.nousergon-marketing.pages.dev

View logs

@cipher813 cipher813 merged commit 6d5bfce into main May 27, 2026
2 checks passed
@cipher813 cipher813 deleted the feat/artifact-freshness-page branch May 27, 2026 22:07
cipher813 added a commit that referenced this pull request May 28, 2026
… PyArrow-backed null columns (#136)

Page 26 crashed with `TypeError: fromisoformat: argument must be str`
once the freshness-monitor Phase 6 bootstrap landed live data this
afternoon. Stack pointed at
`filtered["last_modified"].apply(_format_age)` (line 208).

Root cause: pandas reads `check_results.json`'s mixed null/string
`last_modified` column with the PyArrow backend, which represents JSON
nulls as `pd.NA` rather than Python `None`. The function's
`if not iso_ts:` truthiness check doesn't reliably bail on `pd.NA`
for every dtype path, AND `datetime.fromisoformat(pd.NA)` raises
`TypeError` rather than `ValueError`, so the existing
`except ValueError` doesn't catch it and the page crashes.

This was a latent bug since page 26 shipped (alpha-engine-dashboard
PR #134, 2026-05-27 follow-on session) — but it didn't surface until
this afternoon's Phase 6 bootstrap landed live `check_results.json`
with 49/51 null `last_modified` values (grace_period entries that
hadn't been probed yet on the cold-start cycle).

Fix: explicit `isinstance(iso_ts, str)` type-check at function entry
+ broaden the except clause to `(ValueError, TypeError)`. Tested
against all input shapes: pd.NA / None / empty str / valid ISO /
garbage str / datetime object — all behave correctly.

Per [[feedback_observe_mode_unconditional_gates_govern_cutover]] this
is exactly the bug class the freshness-monitor arc exists to catch
structurally — surfaced at first real load, not at deploy time.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 28, 2026
Surfaces the per-cycle history for each artifact, reading the new
_freshness_monitor/history.json written daily at 04:00 UTC by the
freshness-monitor Lambda's historical mode (alpha-engine-data PR #339).

Closes the gap surfaced 2026-05-28: 'are there gaps in the producer's
history?' — operators want to know not just current-cycle state but
whether last weekend / last month had silent absences.

Changes:

- _load_history loader (TTL 300s — refreshes once/day, not 15min)
- New History (12wk) column on the main table:
    ✅ N/N continuous   — clean history
    ⚠️ G/N gaps        — gappy producer
    ✅ exists (latest)  — latest-pointer present
    ❌ absent (latest)  — latest-pointer missing
    —                   — historical probe hasn't covered this id yet
      (continuous-cadence artifacts skip historical mode)
- New 'Per-artifact history drill-down' section below the main
  table. Each artifact in the filtered view gets an expander
  showing the per-cycle sequence (date / present / size /
  last_modified / error_code). Sort: gappy first, continuous last;
  latest-pointer absent at top, latest-pointer present at bottom.
  First 3 worst-offender entries auto-expand.
- Graceful-degrade: if history.json doesn't exist yet, page shows
  a single info box explaining the daily cron + manual-invoke
  instructions.

Operator caveat: calendar-naive. NYSE holidays may render as
false-positive ❌ absent cells. Calendar-aware probe is a future
enhancement (P3 in the Lambda PR).

Composes with alpha-engine-data PR #339 (historical-mode Lambda)
+ the prior page 26/27 work in #134/#135/#136/#137.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant