fix(sdlc): repoint opus route-authority re-mint off the stale worktree + un-suppress its failure alert#3841
Conversation
…e + un-suppress its failure alert
hapax-opus-route-authority-receipt.service hardcoded the primary worktree
(~/projects/hapax-council), which parks on feature branches where the minter
does not exist, so upkeep failed status=2 every 6h and the receipt silently
froze at its 02:30Z issuance while the notify-failure coalescer suppressed
every alert ("Timer-backed ... failed (suppressed - timer retries)"). When the
24h window closes, opus dispatch loses route authority with no automated
refresh and no operator-visible warning. Same antipattern as the INV checker.
- Repoint ExecStart/WorkingDirectory/PYTHONPATH at the stable source-activation
active-deploy symlink (always the activated origin/main release; script +
synced .venv present), decoupling upkeep from dev-worktree branch churn.
- The minter self-escalates an un-suppressed ntfy on N consecutive re-mint
failures (default 3) or within --alert-within (default 2h) of expiry, tracked
in a sidecar OUTSIDE the scanned route-authority/ dir (the read-path globs
*.json there and raises on non-receipts).
- Backfilled the live receipt (issued_at -> now) so it does not lapse before
the fix deploys.
Task: reform-opus-receipt-remint-repoint-20260601
AuthorityCase: CASE-CROSS-RUNTIME-COMMS-001
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR extends ChangesReceipt Failure Escalation Notifications
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ef942d05c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # upkeep (status=2 "No such file") until the receipt lapsed. The symlink always | ||
| # resolves to the activated origin/main release (script + synced .venv present) | ||
| # and its PATH is stable across deploys — only its target advances. | ||
| ExecStart=%h/.cache/hapax/source-activation/worktree/.venv/bin/python %h/.cache/hapax/source-activation/worktree/scripts/hapax-mint-route-authority-receipt --ensure-fresh --receipt-type opus_model_entitlement --route-id claude.headless.opus --stale-after 24h --refresh-within 8h --alert-within 2h --alert-after-failures 3 |
There was a problem hiding this comment.
Make the failure threshold reachable before expiry
With the paired timer checked in systemd/units/hapax-opus-route-authority-receipt.timer (OnUnitActiveSec=6h), this --stale-after 24h --refresh-within 8h --alert-within 2h --alert-after-failures 3 combination cannot actually trip the N-consecutive-failure alert before the receipt lapses in normal timer operation: upkeep does not attempt a re-mint until the receipt has at most 8h left, so a 6h cadence gives at most two attempts before expiry, with the second already at or near the 2h expiry window. If the goal is an earlier unsuppressed warning after repeated failures, the threshold, refresh window, or timer cadence needs to be adjusted so three failed runs are possible before the receipt is about to expire.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
scripts/hapax-mint-route-authority-receipt (1)
196-206: ⚡ Quick winConsider sharing the load+validate logic with
_fresh_enough_to_keep.
_existing_receipt_remainingrepeats the exactis_file→model_validate→ except-set guard and the_parse_duration_spec(...) - (now - _coerce_utc(...))computation already in_fresh_enough_to_keep(Lines 93-107). If the accepted exception set or freshness math ever changes, the two can silently diverge.♻️ Extract a shared loader
+def _load_receipt(target: Path) -> RouteAuthorityReceipt | None: + if not target.is_file(): + return None + try: + return RouteAuthorityReceipt.model_validate( + json.loads(target.read_text(encoding="utf-8")) + ) + except (OSError, json.JSONDecodeError, ValidationError, ValueError): + return None + + def _existing_receipt_remaining(target: Path, *, now: datetime) -> timedelta | None: """Freshness remaining on the live receipt, or ``None`` if absent/unreadable.""" - if not target.is_file(): - return None - try: - existing = RouteAuthorityReceipt.model_validate( - json.loads(target.read_text(encoding="utf-8")) - ) - except (OSError, json.JSONDecodeError, ValidationError, ValueError): - return None - return _parse_duration_spec(existing.stale_after) - (now - _coerce_utc(existing.issued_at)) + existing = _load_receipt(target) + if existing is None: + return None + return _parse_duration_spec(existing.stale_after) - (now - _coerce_utc(existing.issued_at))
_fresh_enough_to_keepcan then call_load_receipt(target)for its load step.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/hapax-mint-route-authority-receipt` around lines 196 - 206, Extract a shared loader function (e.g. _load_receipt(target: Path) -> RouteAuthorityReceipt | None) that encapsulates the repeated "if not target.is_file() -> read_text -> json.loads -> RouteAuthorityReceipt.model_validate" flow and the common except set (OSError, json.JSONDecodeError, ValidationError, ValueError) returning None on failure; then have both _existing_receipt_remaining and _fresh_enough_to_keep call this new _load_receipt to obtain the validated receipt (or None) and keep the existing freshness math (_parse_duration_spec(...) - (now - _coerce_utc(...))) in each caller so the error handling and validation logic is centralized and cannot diverge.tests/scripts/test_hapax_mint_route_authority_receipt.py (1)
264-270: ⚡ Quick winPin
NTFY_TOPICso this test always reachesurlopen.
_post_ntfyreturnsFalseearly whenNTFY_TOPICresolves to an empty string (Lines 154-156). It defaults tohapax-opswhen the var is unset, so this test usually works — but if the runner environment exportsNTFY_TOPIC=(a valid opt-out config), the call short-circuits beforeurlopenand the test passes without exercising the network-error swallow path it's meant to verify.💚 Make the test environment-independent
def test_post_ntfy_is_best_effort_on_network_error(monkeypatch) -> None: + monkeypatch.setenv("NTFY_TOPIC", "hapax-ops-test") def _explode(*_args, **_kwargs): raise OSError("ntfy unreachable") monkeypatch.setattr(MINT, "urlopen", _explode, raising=False) # Never raises into the caller — upkeep alerting must not crash the minter. assert MINT._post_ntfy("title", "body", priority="high") is False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/scripts/test_hapax_mint_route_authority_receipt.py` around lines 264 - 270, The test test_post_ntfy_is_best_effort_on_network_error currently can short-circuit because NTFY_TOPIC may be set to an empty string in the runner; before calling MINT._post_ntfy, pin NTFY_TOPIC to a non-empty value (e.g., "hapax-ops") using monkeypatch.setenv("NTFY_TOPIC", "hapax-ops") so the function proceeds to call urlopen and exercises the network-error path; keep the existing monkeypatch that replaces MINT.urlopen with _explode and then assert MINT._post_ntfy("title", "body", priority="high") is False.systemd/units/hapax-opus-route-authority-receipt.service (1)
22-22: ⚡ Quick winAdd an
ExecStartPresource check forhapax-mint-route-authority-receipt
hapax-compositor-runtime-source-checksupports--require-file <relative-path>and fails with a clear “required runtime source file missing” error when$SOURCE_ROOT/<rel>is absent; this unit already setsWorkingDirectory=%h/.cache/hapax/source-activation/worktree, matching the tool’s defaultSOURCE_ROOT.🛡️ Proposed ExecStartPre
+ExecStartPre=%h/.cache/hapax/source-activation/worktree/scripts/hapax-compositor-runtime-source-check --require-file scripts/hapax-mint-route-authority-receipt ExecStart=%h/.cache/hapax/source-activation/worktree/.venv/bin/python %h/.cache/hapax/source-activation/worktree/scripts/hapax-mint-route-authority-receipt --ensure-fresh --receipt-type opus_model_entitlement --route-id claude.headless.opus --stale-after 24h --refresh-within 8h --alert-within 2h --alert-after-failures 3🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@systemd/units/hapax-opus-route-authority-receipt.service` at line 22, Add an ExecStartPre that runs hapax-compositor-runtime-source-check with --require-file pointing to the relative runtime script to ensure the source exists before ExecStart; since the unit already sets WorkingDirectory=%h/.cache/hapax/source-activation/worktree, invoke hapax-compositor-runtime-source-check --require-file scripts/hapax-mint-route-authority-receipt (or the correct relative path to the script) and place this ExecStartPre immediately before the existing ExecStart line so the service fails fast with the “required runtime source file missing” error if the file is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@scripts/hapax-mint-route-authority-receipt`:
- Around line 196-206: Extract a shared loader function (e.g.
_load_receipt(target: Path) -> RouteAuthorityReceipt | None) that encapsulates
the repeated "if not target.is_file() -> read_text -> json.loads ->
RouteAuthorityReceipt.model_validate" flow and the common except set (OSError,
json.JSONDecodeError, ValidationError, ValueError) returning None on failure;
then have both _existing_receipt_remaining and _fresh_enough_to_keep call this
new _load_receipt to obtain the validated receipt (or None) and keep the
existing freshness math (_parse_duration_spec(...) - (now - _coerce_utc(...)))
in each caller so the error handling and validation logic is centralized and
cannot diverge.
In `@systemd/units/hapax-opus-route-authority-receipt.service`:
- Line 22: Add an ExecStartPre that runs hapax-compositor-runtime-source-check
with --require-file pointing to the relative runtime script to ensure the source
exists before ExecStart; since the unit already sets
WorkingDirectory=%h/.cache/hapax/source-activation/worktree, invoke
hapax-compositor-runtime-source-check --require-file
scripts/hapax-mint-route-authority-receipt (or the correct relative path to the
script) and place this ExecStartPre immediately before the existing ExecStart
line so the service fails fast with the “required runtime source file missing”
error if the file is absent.
In `@tests/scripts/test_hapax_mint_route_authority_receipt.py`:
- Around line 264-270: The test test_post_ntfy_is_best_effort_on_network_error
currently can short-circuit because NTFY_TOPIC may be set to an empty string in
the runner; before calling MINT._post_ntfy, pin NTFY_TOPIC to a non-empty value
(e.g., "hapax-ops") using monkeypatch.setenv("NTFY_TOPIC", "hapax-ops") so the
function proceeds to call urlopen and exercises the network-error path; keep the
existing monkeypatch that replaces MINT.urlopen with _explode and then assert
MINT._post_ntfy("title", "body", priority="high") is False.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 226a9374-a7bf-4954-88f7-23eeb457ee06
📒 Files selected for processing (3)
scripts/hapax-mint-route-authority-receiptsystemd/units/hapax-opus-route-authority-receipt.servicetests/scripts/test_hapax_mint_route_authority_receipt.py
What
hapax-opus-route-authority-receipt.servicekeeps the opus route-authority receipt fresh so opus is reachable by default for SDLC dispatch. Its ExecStart hardcoded the PRIMARY worktree (~/projects/hapax-council), which parks on feature branches (e.g.alpha/screwm-*) where the minter does not exist — so the timer-backed re-mint failedstatus=2("No such file") every 6h and the receipt SILENTLY FROZE at its 02:30Z issuance. The notify-failure coalescer suppresses timer-backed failures ("Timer-backed ... failed (suppressed — timer retries)"), so the operator got ZERO notification across 3+ consecutive failures. When the 24h window closes, opus dispatch loses route authority with no automated refresh and no observable warning.Changes
ExecStart/WorkingDirectory/PYTHONPATHnow resolve from the stable source-activation active-deploy symlink (~/.cache/hapax/source-activation/worktree) — always the activatedorigin/mainrelease (script + synced.venvpresent), decoupled from dev-worktree branch churn. Same antipattern as the INV checker (fix(sdlc): activate inert INV-1..5 trace-checker + unify auto-mint onto the live escape-grant contract #3839).--ensure-freshupkeep, a FAILED re-mint self-escalates an un-suppressed ntfy (independent of the coalescer) on N consecutive failures (--alert-after-failures, default 3) OR within T of expiry (--alert-within, default 2h). Failure state lives in a sidecar underroute-authority-upkeep/, OUTSIDE the scannedroute-authority/dir (the read-path globs*.jsonthere and raises on non-receipts).NTFY_URL/NTFY_TOPICon the unit (mirrorshapax-sdlc-invariants.service).issued_at-> 2026-06-01T17:04:52Z) so it does not lapse before this deploys.Acceptance criteria
--ensure-fresh ... --alert-within 2h --alert-after-failures 3command line were confirmed rc=0 in-session. Receipt mtime advancing across a full timer interval is observed POST-DEPLOY: this is deploy-coupled — the new unit and the new-flag minter activate together via source-activation on merge (same activation dependency as the sibling INV checker).Test evidence
uv run pytest tests/scripts/test_hapax_mint_route_authority_receipt.py -q-> 14 passed. New tests: alert within T-2h, alert after N consecutive, success resets the counter, sidecar never pollutes the scanned dir, ntfy best-effort on network error, unit-points-at-deploy-worktree regression.Task:
reform-opus-receipt-remint-repoint-20260601· AuthorityCase: CASE-CROSS-RUNTIME-COMMS-001Parent spec: coordination-reform-master-design-2026-05-30.md
🤖 Generated with Claude Code