Skip to content

fix(sdlc): repoint opus route-authority re-mint off the stale worktree + un-suppress its failure alert#3841

Merged
ryanklee merged 1 commit into
mainfrom
zeta/reform-opus-receipt-remint-repoint-20260601
Jun 1, 2026
Merged

fix(sdlc): repoint opus route-authority re-mint off the stale worktree + un-suppress its failure alert#3841
ryanklee merged 1 commit into
mainfrom
zeta/reform-opus-receipt-remint-repoint-20260601

Conversation

@ryanklee
Copy link
Copy Markdown
Collaborator

@ryanklee ryanklee commented Jun 1, 2026

What

hapax-opus-route-authority-receipt.service keeps the opus route-authority receipt fresh so opus is reachable by default for SDLC dispatch. Its ExecStart hardcoded the PRIMARY worktree (~/projects/hapax-council), which parks on feature branches (e.g. alpha/screwm-*) where the minter does not exist — so the timer-backed re-mint failed status=2 ("No such file") every 6h and the receipt SILENTLY FROZE at its 02:30Z issuance. The notify-failure coalescer suppresses timer-backed failures ("Timer-backed ... failed (suppressed — timer retries)"), so the operator got ZERO notification across 3+ consecutive failures. When the 24h window closes, opus dispatch loses route authority with no automated refresh and no observable warning.

Changes

  • Repoint off the stale worktree: ExecStart/WorkingDirectory/PYTHONPATH now resolve from the stable source-activation active-deploy symlink (~/.cache/hapax/source-activation/worktree) — always the activated origin/main release (script + synced .venv present), decoupled from dev-worktree branch churn. Same antipattern as the INV checker (fix(sdlc): activate inert INV-1..5 trace-checker + unify auto-mint onto the live escape-grant contract #3839).
  • Un-suppress the failure alert: in --ensure-fresh upkeep, a FAILED re-mint self-escalates an un-suppressed ntfy (independent of the coalescer) on N consecutive failures (--alert-after-failures, default 3) OR within T of expiry (--alert-within, default 2h). Failure state lives in a sidecar under route-authority-upkeep/, OUTSIDE the scanned route-authority/ dir (the read-path globs *.json there and raises on non-receipts).
  • Wired NTFY_URL/NTFY_TOPIC on the unit (mirrors hapax-sdlc-invariants.service).
  • Backfilled the live receipt (issued_at -> 2026-06-01T17:04:52Z) so it does not lapse before this deploys.

Acceptance criteria

  • Receipt refreshed now (backfilled to 2026-06-01T17:04:52Z, fresh 24h window).
  • ntfy fires on N consecutive failures OR within T-2h of expiry — verified by forcing failures in tests.
  • Ruff check + format clean; pyright 0 errors; 14 minter tests pass.
  • Service runs rc=0 from the new ExecStart — the new path and the exact --ensure-fresh ... --alert-within 2h --alert-after-failures 3 command line were confirmed rc=0 in-session. Receipt mtime advancing across a full timer interval is observed POST-DEPLOY: this is deploy-coupled — the new unit and the new-flag minter activate together via source-activation on merge (same activation dependency as the sibling INV checker).

Test evidence

uv run pytest tests/scripts/test_hapax_mint_route_authority_receipt.py -q -> 14 passed. New tests: alert within T-2h, alert after N consecutive, success resets the counter, sidecar never pollutes the scanned dir, ntfy best-effort on network error, unit-points-at-deploy-worktree regression.

Task: reform-opus-receipt-remint-repoint-20260601 · AuthorityCase: CASE-CROSS-RUNTIME-COMMS-001
Parent spec: coordination-reform-master-design-2026-05-30.md

🤖 Generated with Claude Code

…e + un-suppress its failure alert

hapax-opus-route-authority-receipt.service hardcoded the primary worktree
(~/projects/hapax-council), which parks on feature branches where the minter
does not exist, so upkeep failed status=2 every 6h and the receipt silently
froze at its 02:30Z issuance while the notify-failure coalescer suppressed
every alert ("Timer-backed ... failed (suppressed - timer retries)"). When the
24h window closes, opus dispatch loses route authority with no automated
refresh and no operator-visible warning. Same antipattern as the INV checker.

- Repoint ExecStart/WorkingDirectory/PYTHONPATH at the stable source-activation
  active-deploy symlink (always the activated origin/main release; script +
  synced .venv present), decoupling upkeep from dev-worktree branch churn.
- The minter self-escalates an un-suppressed ntfy on N consecutive re-mint
  failures (default 3) or within --alert-within (default 2h) of expiry, tracked
  in a sidecar OUTSIDE the scanned route-authority/ dir (the read-path globs
  *.json there and raises on non-receipts).
- Backfilled the live receipt (issued_at -> now) so it does not lapse before
  the fix deploys.

Task: reform-opus-receipt-remint-repoint-20260601
AuthorityCase: CASE-CROSS-RUNTIME-COMMS-001

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR extends hapax-mint-route-authority-receipt with ntfy-based failure escalation notifications during --ensure-fresh re-mint operations. It adds upkeep-state sidecar tracking to monitor consecutive failures, new CLI alert-threshold options, systemd unit deployment with ntfy environment variables, and comprehensive integration tests validating notification triggers and sidecar isolation.

Changes

Receipt Failure Escalation Notifications

Layer / File(s) Summary
Escalation notification infrastructure
scripts/hapax-mint-route-authority-receipt
Adds constants (ROUTE_AUTHORITY_UPKEEP_DIRNAME, DEFAULT_NTFY_URL, DEFAULT_NTFY_TOPIC), an ntfy POST helper that never raises on network errors, functions to manage upkeep-state sidecar JSON files tracking consecutive failures, compute receipt freshness, format alert timestamps, and escalate failures when expiry/consecutive-failure thresholds are met. Imports os module.
CLI and main() integration for --ensure-fresh escalation
scripts/hapax-mint-route-authority-receipt
Extends CLI with --alert-within (default 2h) and --alert-after-failures (default 3) options for --ensure-fresh mode. Refactors main() to precompute a single now_dt and initialize tracking variables. Updates --ensure-fresh freshness check to reuse now_dt, record upkeep success when receipt is kept, and conditionally escalate via the upkeep-state mechanism on mint errors (ValueError, ValidationError, OSError) before returning exit code 2; records upkeep success after successful write.
Systemd unit deployment
systemd/units/hapax-opus-route-authority-receipt.service
Switches Documentation=, WorkingDirectory, and PYTHONPATH from projects/hapax-council to source-activation/worktree deploy symlink. Extends ExecStart with --refresh-within, --alert-within, and --alert-after-failures parameters. Adds NTFY_URL and NTFY_TOPIC environment variables to enable failure notifications.
Comprehensive test coverage for escalation behavior
tests/scripts/test_hapax_mint_route_authority_receipt.py
Adds helpers to simulate write failures and record ntfy calls. Six tests verify: re-mint failure within expiry window fires ntfy with high priority, failures outside window suppress notifications until consecutive-failure threshold (N=3), successful re-mint resets the persisted failure counter, upkeep sidecar files stay outside route-authority/ directory, _post_ntfy returns False gracefully on network error, and systemd unit runs from active-deploy symlink with --ensure-fresh and NTFY_TOPIC= configuration present.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A receipt that's fresh will always stay,
But if it fails—don't dismay!
Ntfy rings with urgent cheer,
Upkeep sidecars whisper near,
Failures tracked till all is well.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main changes: repointing the systemd unit away from the stale worktree and enabling failure alerts for the route-authority receipt refresh mechanism.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description comprehensively covers What/Why/Changes/Acceptance criteria with technical context, case reference, and test evidence.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zeta/reform-opus-receipt-remint-repoint-20260601

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryanklee ryanklee added this pull request to the merge queue Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ef942d05c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# upkeep (status=2 "No such file") until the receipt lapsed. The symlink always
# resolves to the activated origin/main release (script + synced .venv present)
# and its PATH is stable across deploys — only its target advances.
ExecStart=%h/.cache/hapax/source-activation/worktree/.venv/bin/python %h/.cache/hapax/source-activation/worktree/scripts/hapax-mint-route-authority-receipt --ensure-fresh --receipt-type opus_model_entitlement --route-id claude.headless.opus --stale-after 24h --refresh-within 8h --alert-within 2h --alert-after-failures 3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make the failure threshold reachable before expiry

With the paired timer checked in systemd/units/hapax-opus-route-authority-receipt.timer (OnUnitActiveSec=6h), this --stale-after 24h --refresh-within 8h --alert-within 2h --alert-after-failures 3 combination cannot actually trip the N-consecutive-failure alert before the receipt lapses in normal timer operation: upkeep does not attempt a re-mint until the receipt has at most 8h left, so a 6h cadence gives at most two attempts before expiry, with the second already at or near the 2h expiry window. If the goal is an earlier unsuppressed warning after repeated failures, the threshold, refresh window, or timer cadence needs to be adjusted so three failed runs are possible before the receipt is about to expire.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
scripts/hapax-mint-route-authority-receipt (1)

196-206: ⚡ Quick win

Consider sharing the load+validate logic with _fresh_enough_to_keep.

_existing_receipt_remaining repeats the exact is_filemodel_validate → except-set guard and the _parse_duration_spec(...) - (now - _coerce_utc(...)) computation already in _fresh_enough_to_keep (Lines 93-107). If the accepted exception set or freshness math ever changes, the two can silently diverge.

♻️ Extract a shared loader
+def _load_receipt(target: Path) -> RouteAuthorityReceipt | None:
+    if not target.is_file():
+        return None
+    try:
+        return RouteAuthorityReceipt.model_validate(
+            json.loads(target.read_text(encoding="utf-8"))
+        )
+    except (OSError, json.JSONDecodeError, ValidationError, ValueError):
+        return None
+
+
 def _existing_receipt_remaining(target: Path, *, now: datetime) -> timedelta | None:
     """Freshness remaining on the live receipt, or ``None`` if absent/unreadable."""
-    if not target.is_file():
-        return None
-    try:
-        existing = RouteAuthorityReceipt.model_validate(
-            json.loads(target.read_text(encoding="utf-8"))
-        )
-    except (OSError, json.JSONDecodeError, ValidationError, ValueError):
-        return None
-    return _parse_duration_spec(existing.stale_after) - (now - _coerce_utc(existing.issued_at))
+    existing = _load_receipt(target)
+    if existing is None:
+        return None
+    return _parse_duration_spec(existing.stale_after) - (now - _coerce_utc(existing.issued_at))

_fresh_enough_to_keep can then call _load_receipt(target) for its load step.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/hapax-mint-route-authority-receipt` around lines 196 - 206, Extract a
shared loader function (e.g. _load_receipt(target: Path) ->
RouteAuthorityReceipt | None) that encapsulates the repeated "if not
target.is_file() -> read_text -> json.loads ->
RouteAuthorityReceipt.model_validate" flow and the common except set (OSError,
json.JSONDecodeError, ValidationError, ValueError) returning None on failure;
then have both _existing_receipt_remaining and _fresh_enough_to_keep call this
new _load_receipt to obtain the validated receipt (or None) and keep the
existing freshness math (_parse_duration_spec(...) - (now - _coerce_utc(...)))
in each caller so the error handling and validation logic is centralized and
cannot diverge.
tests/scripts/test_hapax_mint_route_authority_receipt.py (1)

264-270: ⚡ Quick win

Pin NTFY_TOPIC so this test always reaches urlopen.

_post_ntfy returns False early when NTFY_TOPIC resolves to an empty string (Lines 154-156). It defaults to hapax-ops when the var is unset, so this test usually works — but if the runner environment exports NTFY_TOPIC= (a valid opt-out config), the call short-circuits before urlopen and the test passes without exercising the network-error swallow path it's meant to verify.

💚 Make the test environment-independent
 def test_post_ntfy_is_best_effort_on_network_error(monkeypatch) -> None:
+    monkeypatch.setenv("NTFY_TOPIC", "hapax-ops-test")
     def _explode(*_args, **_kwargs):
         raise OSError("ntfy unreachable")

     monkeypatch.setattr(MINT, "urlopen", _explode, raising=False)
     # Never raises into the caller — upkeep alerting must not crash the minter.
     assert MINT._post_ntfy("title", "body", priority="high") is False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/scripts/test_hapax_mint_route_authority_receipt.py` around lines 264 -
270, The test test_post_ntfy_is_best_effort_on_network_error currently can
short-circuit because NTFY_TOPIC may be set to an empty string in the runner;
before calling MINT._post_ntfy, pin NTFY_TOPIC to a non-empty value (e.g.,
"hapax-ops") using monkeypatch.setenv("NTFY_TOPIC", "hapax-ops") so the function
proceeds to call urlopen and exercises the network-error path; keep the existing
monkeypatch that replaces MINT.urlopen with _explode and then assert
MINT._post_ntfy("title", "body", priority="high") is False.
systemd/units/hapax-opus-route-authority-receipt.service (1)

22-22: ⚡ Quick win

Add an ExecStartPre source check for hapax-mint-route-authority-receipt

hapax-compositor-runtime-source-check supports --require-file <relative-path> and fails with a clear “required runtime source file missing” error when $SOURCE_ROOT/<rel> is absent; this unit already sets WorkingDirectory=%h/.cache/hapax/source-activation/worktree, matching the tool’s default SOURCE_ROOT.

🛡️ Proposed ExecStartPre
+ExecStartPre=%h/.cache/hapax/source-activation/worktree/scripts/hapax-compositor-runtime-source-check --require-file scripts/hapax-mint-route-authority-receipt
 ExecStart=%h/.cache/hapax/source-activation/worktree/.venv/bin/python %h/.cache/hapax/source-activation/worktree/scripts/hapax-mint-route-authority-receipt --ensure-fresh --receipt-type opus_model_entitlement --route-id claude.headless.opus --stale-after 24h --refresh-within 8h --alert-within 2h --alert-after-failures 3
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@systemd/units/hapax-opus-route-authority-receipt.service` at line 22, Add an
ExecStartPre that runs hapax-compositor-runtime-source-check with --require-file
pointing to the relative runtime script to ensure the source exists before
ExecStart; since the unit already sets
WorkingDirectory=%h/.cache/hapax/source-activation/worktree, invoke
hapax-compositor-runtime-source-check --require-file
scripts/hapax-mint-route-authority-receipt (or the correct relative path to the
script) and place this ExecStartPre immediately before the existing ExecStart
line so the service fails fast with the “required runtime source file missing”
error if the file is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/hapax-mint-route-authority-receipt`:
- Around line 196-206: Extract a shared loader function (e.g.
_load_receipt(target: Path) -> RouteAuthorityReceipt | None) that encapsulates
the repeated "if not target.is_file() -> read_text -> json.loads ->
RouteAuthorityReceipt.model_validate" flow and the common except set (OSError,
json.JSONDecodeError, ValidationError, ValueError) returning None on failure;
then have both _existing_receipt_remaining and _fresh_enough_to_keep call this
new _load_receipt to obtain the validated receipt (or None) and keep the
existing freshness math (_parse_duration_spec(...) - (now - _coerce_utc(...)))
in each caller so the error handling and validation logic is centralized and
cannot diverge.

In `@systemd/units/hapax-opus-route-authority-receipt.service`:
- Line 22: Add an ExecStartPre that runs hapax-compositor-runtime-source-check
with --require-file pointing to the relative runtime script to ensure the source
exists before ExecStart; since the unit already sets
WorkingDirectory=%h/.cache/hapax/source-activation/worktree, invoke
hapax-compositor-runtime-source-check --require-file
scripts/hapax-mint-route-authority-receipt (or the correct relative path to the
script) and place this ExecStartPre immediately before the existing ExecStart
line so the service fails fast with the “required runtime source file missing”
error if the file is absent.

In `@tests/scripts/test_hapax_mint_route_authority_receipt.py`:
- Around line 264-270: The test test_post_ntfy_is_best_effort_on_network_error
currently can short-circuit because NTFY_TOPIC may be set to an empty string in
the runner; before calling MINT._post_ntfy, pin NTFY_TOPIC to a non-empty value
(e.g., "hapax-ops") using monkeypatch.setenv("NTFY_TOPIC", "hapax-ops") so the
function proceeds to call urlopen and exercises the network-error path; keep the
existing monkeypatch that replaces MINT.urlopen with _explode and then assert
MINT._post_ntfy("title", "body", priority="high") is False.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 226a9374-a7bf-4954-88f7-23eeb457ee06

📥 Commits

Reviewing files that changed from the base of the PR and between d46c933 and 4ef942d.

📒 Files selected for processing (3)
  • scripts/hapax-mint-route-authority-receipt
  • systemd/units/hapax-opus-route-authority-receipt.service
  • tests/scripts/test_hapax_mint_route_authority_receipt.py

Merged via the queue into main with commit a599341 Jun 1, 2026
39 of 40 checks passed
@ryanklee ryanklee deleted the zeta/reform-opus-receipt-remint-repoint-20260601 branch June 1, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant