fix(sdlc): bootstrap helper fails OPEN on infra error + atomic canonical hooks deploy#3848
Conversation
…cal hooks deploy
The unclaimed governance-intake bootstrap (cc-task-gate.impl.sh section 3b) is
the roleless session's ONLY sanctioned write path, yet it mapped EVERY non-{0,10}
helper exit to a hard BLOCK — so python3's own "can't open file" rc==2 (an
unreadable / mid-atomic-swap helper) was indistinguishable from a genuine BLOCKED
verdict and fail-closed even a properly CLAIMED session during a hooks-doctor
redeploy (the S2 incident). Mirror the shim's INV-5 fail-OPEN posture (master
design section 2.2 / FM-15 / NEW-2):
- A pre-invocation readability guard + an rc map where ONLY rc==12 blocks; every
other non-{0,10} code (rc==2 infra, 1, 127, ...) is an infra signal, never a deny.
- On infra error a bootstrap CANDIDATE write (a Write of a .md note under
hapax-requests/active or hapax-cc-tasks/active) FAILS OPEN with a loud ledger
line; any other mutation falls through to the normal claim/authority gate, so
the fail-open never widens what a non-bootstrap mutation may do.
hooks-doctor --deploy-canonical was non-atomic: it installed each closure file
(unlinkat+create) with the impl FIRST, so during a redeploy a sibling was briefly
absent and the new impl could go live ahead of its closure. Stage the whole
closure + MANIFEST into a temp dir on the same filesystem and rename(2) each into
place, publishing the impl LAST; refuse an incomplete closure up front so a
refused deploy is a clean no-op (was a half-swapped closure).
Regression tests: tests/hooks/test_cc_task_gate_bootstrap_failopen.py (the
fail-open matrix incl. the claimed-session unblock) and three atomic-deploy tests
in tests/hooks/test_hooks_doctor.py (incomplete-source no-op, strace
rename/impl-last ordering, concurrent-redeploy no-missing-sibling).
Task: reform-bootstrap-failopen-atomic-swap-20260601
AuthorityCase: CASE-SDLC-REFORM-001
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR refactors bootstrap helper integration and canonical gate deployment. The bootstrap phase now fails open (not closed) for candidate governance-intake markdown writes when the helper is unreadable or returns infra errors, while introducing exit code 12 as the sole blocking verdict. Canonical deployment is made atomic by validating closure sibling presence upfront, staging files under a temp directory, and publishing siblings before the impl via atomic rename. ChangesBootstrap Fail-Open for Unclaimed Sessions
Atomic Canonical Gate Deployment
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e57e52d136
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| local p="${edit_path/#\~/$HOME}" | ||
| [[ -n "$p" && "$p" == *.md ]] || return 1 | ||
| case "$p" in | ||
| "$HOME"/Documents/Personal/20-projects/hapax-requests/active/*) return 0 ;; | ||
| "$HOME"/Documents/Personal/20-projects/hapax-cc-tasks/active/*) return 0 ;; |
There was a problem hiding this comment.
Require canonical bootstrap fail-open targets
When the helper is unavailable, this string-prefix check treats paths like $HOME/Documents/Personal/20-projects/hapax-requests/active/../../outside.md as bootstrap candidates and _bootstrap_infra_failopen exits 0, so an unclaimed Write can bypass the normal gate for markdown outside the intake roots; it also allows overwriting existing active notes because it does not mirror the helper's resolve(...).is_relative_to(...) and target.exists() checks. In the helper-infra-error cases introduced here, please canonicalize the target and reject existing files before failing open so the fallback stays limited to new governance-intake notes.
Useful? React with 👍 / 👎.
| elif helper == "unreadable": | ||
| shutil.copy2(HOOKS_SRC / _HELPER, helper_path) | ||
| helper_path.chmod(0o000) |
There was a problem hiding this comment.
Simulate unreadable helpers without chmod
When these tests run as root, chmod(0o000) does not make the helper unreadable: Bash [[ -r ... ]] still succeeds and python3 can read the file, so the helper="unreadable" parametrizations execute the real validator instead of exercising the new infra-fail-open path and can fail for the invalid fixture content. Use a simulation that is unreadable even to root (or a fake helper that exits with the desired infra code) so the regression tests are reliable in containerized CI.
Useful? React with 👍 / 👎.
| if [[ -f "$stage/MANIFEST.sha256" ]]; then | ||
| mv -f "$stage/MANIFEST.sha256" "$CANONICAL_DIR/MANIFEST.sha256" || rc=1 | ||
| fi | ||
| mv -f "$stage/cc-task-gate.sh" "$CANONICAL_DIR/cc-task-gate.sh" || rc=1 |
There was a problem hiding this comment.
Don't publish the impl after a failed sibling rename
If any earlier publish step sets rc=1 (for example because one canonical sibling cannot be replaced), this line still renames the new cc-task-gate.sh into place and only reports failure afterward. That leaves the live canonical with a new impl and a mixed or stale closure, which is exactly the half-deployed state this atomic deploy path is meant to avoid; abort before publishing the impl once any sibling or manifest rename has failed.
Useful? React with 👍 / 👎.
Summary
Reform hardening (CASE-SDLC-REFORM-001, task
reform-bootstrap-failopen-atomic-swap-20260601). Two asymmetries where the gate's own infrastructure could fail CLOSED:1. Bootstrap helper invocation fails OPEN on infra error.
cc-task-gate.impl.sh§3b is the roleless session's only sanctioned write path, yet it mapped every non-{0,10}helper exit to a hard BLOCK. python3's own "can't open file"rc==2(an unreadable / mid-atomic-swap helper) was indistinguishable from a genuineBLOCKED=12verdict, so ahooks-doctorredeploy that briefly unlinked the helper fail-closed even a properly claimed session (the S2 incident). Now it mirrors the shim's INV-5 fail-OPEN posture (master design §2.2 / FM-15 / NEW-2):rc==12blocks; every other non-{0,10}code (rc==2infra,1,127, …) is an infra signal, never a deny.Writeof a.mdnote underhapax-requests/activeorhapax-cc-tasks/active) fails OPEN with a loud ledger line; any other mutation falls through to the normal claim/authority gate, so the fail-open never widens what a non-bootstrap mutation may do.2. Atomic canonical hooks deploy.
hooks-doctor --deploy-canonicalinstalled each closure file (unlinkat+create) with the impl first, so during a redeploy a sibling was briefly absent and the new impl could go live ahead of its closure. Now it stages the whole closure + MANIFEST into a temp dir on the same filesystem andrename(2)s each into place, publishing the impl last; an incomplete closure is refused up front so a refused deploy is a clean no-op (was a half-swapped closure).Test evidence
tests/hooks/test_cc_task_gate_bootstrap_failopen.py— fail-open matrix: helper absent/unreadable → candidate fails open; onlyrc==12blocks;rc==2/1/3/127fall open; non-candidate unclaimed still blocks; claimed in-scope edit no longer blocked by a bad helper.tests/hooks/test_hooks_doctor.py— incomplete-source deploy is a no-op on the live canonical;straceshows every file published viarename(2)with the impl last from a staging dir; concurrent-redeploy stress never exposes a missing/empty sibling.test_cc_task_gate*,test_blocking_exit_codes,test_gate_manifest_check,test_methodology_ledger_digest);ruff check/format+shellcheckclean.AuthorityCase: CASE-SDLC-REFORM-001
Parent spec: coordination-reform-master-design-2026-05-30
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests