Skip to content

fix(sdlc): bootstrap helper fails OPEN on infra error + atomic canonical hooks deploy#3848

Merged
ryanklee merged 1 commit into
mainfrom
delta/reform-bootstrap-failopen-atomic-swap-20260601
Jun 1, 2026
Merged

fix(sdlc): bootstrap helper fails OPEN on infra error + atomic canonical hooks deploy#3848
ryanklee merged 1 commit into
mainfrom
delta/reform-bootstrap-failopen-atomic-swap-20260601

Conversation

@ryanklee
Copy link
Copy Markdown
Collaborator

@ryanklee ryanklee commented Jun 1, 2026

Summary

Reform hardening (CASE-SDLC-REFORM-001, task reform-bootstrap-failopen-atomic-swap-20260601). Two asymmetries where the gate's own infrastructure could fail CLOSED:

1. Bootstrap helper invocation fails OPEN on infra error. cc-task-gate.impl.sh §3b is the roleless session's only sanctioned write path, yet it mapped every non-{0,10} helper exit to a hard BLOCK. python3's own "can't open file" rc==2 (an unreadable / mid-atomic-swap helper) was indistinguishable from a genuine BLOCKED=12 verdict, so a hooks-doctor redeploy that briefly unlinked the helper fail-closed even a properly claimed session (the S2 incident). Now it mirrors the shim's INV-5 fail-OPEN posture (master design §2.2 / FM-15 / NEW-2):

  • A pre-invocation readability guard, and an rc map where only rc==12 blocks; every other non-{0,10} code (rc==2 infra, 1, 127, …) is an infra signal, never a deny.
  • On infra error a bootstrap candidate write (a Write of a .md note under hapax-requests/active or hapax-cc-tasks/active) fails OPEN with a loud ledger line; any other mutation falls through to the normal claim/authority gate, so the fail-open never widens what a non-bootstrap mutation may do.

2. Atomic canonical hooks deploy. hooks-doctor --deploy-canonical installed each closure file (unlinkat+create) with the impl first, so during a redeploy a sibling was briefly absent and the new impl could go live ahead of its closure. Now it stages the whole closure + MANIFEST into a temp dir on the same filesystem and rename(2)s each into place, publishing the impl last; an incomplete closure is refused up front so a refused deploy is a clean no-op (was a half-swapped closure).

Test evidence

  • tests/hooks/test_cc_task_gate_bootstrap_failopen.py — fail-open matrix: helper absent/unreadable → candidate fails open; only rc==12 blocks; rc==2/1/3/127 fall open; non-candidate unclaimed still blocks; claimed in-scope edit no longer blocked by a bad helper.
  • tests/hooks/test_hooks_doctor.py — incomplete-source deploy is a no-op on the live canonical; strace shows every file published via rename(2) with the impl last from a staging dir; concurrent-redeploy stress never exposes a missing/empty sibling.
  • Full gate suite green (248 tests across test_cc_task_gate*, test_blocking_exit_codes, test_gate_manifest_check, test_methodology_ledger_digest); ruff check/format + shellcheck clean.

AuthorityCase: CASE-SDLC-REFORM-001
Parent spec: coordination-reform-master-design-2026-05-30

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Governance intake bootstrap now gracefully fails open for specific candidate writes when validator helper infrastructure is unavailable, while maintaining strict enforcement for other operations.
    • Gate deployment process now uses atomic operations to prevent exposure of incomplete closure states during redeploys.
  • Tests

    • Added comprehensive regression test suite for bootstrap failure scenarios and atomic deployment behavior.

…cal hooks deploy

The unclaimed governance-intake bootstrap (cc-task-gate.impl.sh section 3b) is
the roleless session's ONLY sanctioned write path, yet it mapped EVERY non-{0,10}
helper exit to a hard BLOCK — so python3's own "can't open file" rc==2 (an
unreadable / mid-atomic-swap helper) was indistinguishable from a genuine BLOCKED
verdict and fail-closed even a properly CLAIMED session during a hooks-doctor
redeploy (the S2 incident). Mirror the shim's INV-5 fail-OPEN posture (master
design section 2.2 / FM-15 / NEW-2):
- A pre-invocation readability guard + an rc map where ONLY rc==12 blocks; every
  other non-{0,10} code (rc==2 infra, 1, 127, ...) is an infra signal, never a deny.
- On infra error a bootstrap CANDIDATE write (a Write of a .md note under
  hapax-requests/active or hapax-cc-tasks/active) FAILS OPEN with a loud ledger
  line; any other mutation falls through to the normal claim/authority gate, so
  the fail-open never widens what a non-bootstrap mutation may do.

hooks-doctor --deploy-canonical was non-atomic: it installed each closure file
(unlinkat+create) with the impl FIRST, so during a redeploy a sibling was briefly
absent and the new impl could go live ahead of its closure. Stage the whole
closure + MANIFEST into a temp dir on the same filesystem and rename(2) each into
place, publishing the impl LAST; refuse an incomplete closure up front so a
refused deploy is a clean no-op (was a half-swapped closure).

Regression tests: tests/hooks/test_cc_task_gate_bootstrap_failopen.py (the
fail-open matrix incl. the claimed-session unblock) and three atomic-deploy tests
in tests/hooks/test_hooks_doctor.py (incomplete-source no-op, strace
rename/impl-last ordering, concurrent-redeploy no-missing-sibling).

Task: reform-bootstrap-failopen-atomic-swap-20260601
AuthorityCase: CASE-SDLC-REFORM-001

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR refactors bootstrap helper integration and canonical gate deployment. The bootstrap phase now fails open (not closed) for candidate governance-intake markdown writes when the helper is unreadable or returns infra errors, while introducing exit code 12 as the sole blocking verdict. Canonical deployment is made atomic by validating closure sibling presence upfront, staging files under a temp directory, and publishing siblings before the impl via atomic rename.

Changes

Bootstrap Fail-Open for Unclaimed Sessions

Layer / File(s) Summary
Bootstrap helper fail-open logic
hooks/scripts/cc-task-gate.impl.sh
Helper readability check gates execution; exit code 0 allows, 10 falls through, 12 blocks; infra errors trigger candidate-only fail-open (Write tool + .md under hapax-requests/active/* or hapax-cc-tasks/active/*).
Bootstrap failopen test infrastructure
tests/hooks/test_cc_task_gate_bootstrap_failopen.py
Staging, execution harness, and helpers: _stage_gate creates temp gate with controlled helper behavior, _run executes JSON payloads with isolated environment, ledger/task/claim helpers manage test state.
Bootstrap failopen test assertions
tests/hooks/test_cc_task_gate_bootstrap_failopen.py
Regression tests: candidate fail-open (absent/unreadable helper), exit code dispatch (12 blocks, others fail-open), narrowness (non-candidate unclaimed blocked), coordinator bypass (claimed authorized allowed), sanity (real helper unchanged).

Atomic Canonical Gate Deployment

Layer / File(s) Summary
Atomic canonical deployment refactoring
hooks/scripts/hooks-doctor.sh
Upfront validation of closure sibling presence, staging to temp dir under $CANONICAL_DIR, staged installation with manifest generation, atomic publish (siblings renamed first, impl last), symlink preservation.
Atomic deployment test setup
tests/hooks/test_hooks_doctor.py
Constants for closure sibling filenames; helper to seed incomplete source (impl diverges, sibling omitted) for validation.
Atomic deployment test assertions
tests/hooks/test_hooks_doctor.py
Incomplete source refusal (canonical untouched), atomic rename ordering via strace (impl renamed last from staging), concurrent redeploy safety (canonical never exposes impl without full closure).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • hapax-systems/hapax-council#3835: Ensures cc-task-gate-bootstrap.py is included in the canonical closure to avoid the "helper missing" path that this PR now handles gracefully.
  • hapax-systems/hapax-council#3832: Introduces earlier canonical deployment and shim refactoring that this PR's atomic publish sequence builds upon.
  • hapax-systems/hapax-council#3390: Upstream change to cc-task-gate-bootstrap.py establishing the exit-code contract (10, 12) that this PR's bootstrap helper handling interprets.

Poem

🐰 A shell script ballet, now safe and complete—
Helpers fail graceful, deployments atomic and neat.
Candidates pass through when infra goes wrong,
While blocking verdicts still keep malice at bay all along.
Staging, then renaming the impl with care—
Never incomplete, never caught mid-air. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the two main changes: bootstrap helper fail-open behavior on infra errors and atomic canonical hooks deployment, matching the changeset.
Description check ✅ Passed The description fully satisfies the template with comprehensive Summary, AuthorityCase/Slice details, and proper CLAUDE.md hygiene checkbox completion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch delta/reform-bootstrap-failopen-atomic-swap-20260601

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryanklee ryanklee enabled auto-merge June 1, 2026 22:08
@ryanklee ryanklee added this pull request to the merge queue Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e57e52d136

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +444 to +448
local p="${edit_path/#\~/$HOME}"
[[ -n "$p" && "$p" == *.md ]] || return 1
case "$p" in
"$HOME"/Documents/Personal/20-projects/hapax-requests/active/*) return 0 ;;
"$HOME"/Documents/Personal/20-projects/hapax-cc-tasks/active/*) return 0 ;;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require canonical bootstrap fail-open targets

When the helper is unavailable, this string-prefix check treats paths like $HOME/Documents/Personal/20-projects/hapax-requests/active/../../outside.md as bootstrap candidates and _bootstrap_infra_failopen exits 0, so an unclaimed Write can bypass the normal gate for markdown outside the intake roots; it also allows overwriting existing active notes because it does not mirror the helper's resolve(...).is_relative_to(...) and target.exists() checks. In the helper-infra-error cases introduced here, please canonicalize the target and reject existing files before failing open so the fallback stays limited to new governance-intake notes.

Useful? React with 👍 / 👎.

Comment on lines +73 to +75
elif helper == "unreadable":
shutil.copy2(HOOKS_SRC / _HELPER, helper_path)
helper_path.chmod(0o000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Simulate unreadable helpers without chmod

When these tests run as root, chmod(0o000) does not make the helper unreadable: Bash [[ -r ... ]] still succeeds and python3 can read the file, so the helper="unreadable" parametrizations execute the real validator instead of exercising the new infra-fail-open path and can fail for the invalid fixture content. Use a simulation that is unreadable even to root (or a fake helper that exits with the desired infra code) so the regression tests are reliable in containerized CI.

Useful? React with 👍 / 👎.

if [[ -f "$stage/MANIFEST.sha256" ]]; then
mv -f "$stage/MANIFEST.sha256" "$CANONICAL_DIR/MANIFEST.sha256" || rc=1
fi
mv -f "$stage/cc-task-gate.sh" "$CANONICAL_DIR/cc-task-gate.sh" || rc=1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't publish the impl after a failed sibling rename

If any earlier publish step sets rc=1 (for example because one canonical sibling cannot be replaced), this line still renames the new cc-task-gate.sh into place and only reports failure afterward. That leaves the live canonical with a new impl and a mixed or stale closure, which is exactly the half-deployed state this atomic deploy path is meant to avoid; abort before publishing the impl once any sibling or manifest rename has failed.

Useful? React with 👍 / 👎.

Merged via the queue into main with commit 671b815 Jun 1, 2026
34 of 35 checks passed
@ryanklee ryanklee deleted the delta/reform-bootstrap-failopen-atomic-swap-20260601 branch June 1, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant