Skip to content

feat(gate): strict pre-push verification before commit_and_push#35

Merged
wally-tribute-labs merged 2 commits into
mainfrom
feat/strict-pre-push-verification
May 18, 2026
Merged

feat(gate): strict pre-push verification before commit_and_push#35
wally-tribute-labs merged 2 commits into
mainfrom
feat/strict-pre-push-verification

Conversation

@openlawbot
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a strict pass/fail gate between re-review pass and _commit_and_finish that mirrors what a remote pre-push hook (e.g. Husky) will face at git push time. Fixes the class of bug PR #399 hit: build_verify accepted pre-existing test failures under its tolerance, then Husky's vitest run rejected the push because hooks don't have that tolerance.
  • Hybrid classification: regressions feed back into the same iteration with the failing-test context; pre-existing failures fail fast without consuming iterations. Worktree is preserved across iterations on regression so Codex can build on the partial fix.
  • Zero-cost on the happy path: build_verify now exposes strict_pass alongside pass, and pre_push_verify reuses it by default. Subprocess only fires when pre_push_verify_cmds is explicitly set (e.g. to add contract-drift checks beyond test_cmds).

Motivation

PR #399 activity log shows the exact divergence:

gate.fixer INFO Pre-existing failures accepted
gate.fixer INFO Re-review: pass
gate.github ERROR Commit/push failed: ... vitest run

Gate already had the data to detect this earlier — build_result["overall_pass"] is computed strictly before tolerance is applied at fixer.py:296-302. It was then discarded.

Design

Classification rules

condition kind
strict_pass == True pass
strict_pass == False AND (original_build missing OR parse failure OR project_type uses _parse_generic_test) regression (uncertain → safer side)
strict_pass == False AND after.failed > baseline.failed regression
strict_pass == False AND after.failed <= baseline.failed (reliable parser) pre_existing

Iteration handling

  • regression → resume Codex with failing-test context, re-run pre_push_verify. If still failing, fall through to next iteration without reverting (partial fix preserved). Mirrors the existing build_verify same-iteration retry pattern.
  • pre_existing → fail fast with a clear message. Gate does NOT auto-fix tests it didn't break.
  • Iteration budget bumped from 2 → 3 (config-driven) to absorb a worst-case third iteration.

Configuration

New keys under [repos.build] (all optional, documented in gate.toml.example):

  • pre_push_strict (default: true)
  • pre_push_disable (default: false) — escape hatch
  • pre_push_verify_cmds (default: []) — when set, replaces test_cmds for the gate. Used for hooks that do more than tests (e.g. contract drift).
  • pre_push_timeout_s (default: 600)

New keys under [fix_pipeline]:

  • max_iterations (default: 3, with per-repo override [[repos]].max_fix_iterations)

Open-source posture

Strict-by-default activates only when test_cmds (or pre_push_verify_cmds) is configured. Repos without tests are unaffected. Repos with tests can opt out via pre_push_disable = true. Replacement-vs-additive semantics for pre_push_verify_cmds are documented explicitly so users don't silently drop coverage when overriding.

Test plan

  • Unit tests: 25 new tests covering classification (regression vs pre-existing vs uncertain), default cached path, explicit cmds path, disable flag, prompt builder, fail-fast path. (tests/test_fixer.py, tests/test_fix_pipeline_config.py)
  • Integration tests: 6 new tests covering happy path, regression-retry success, regression-retry failure (no-revert), pre-existing fail-fast, disable flag, and 3-iteration budget. (tests/test_fix_pipeline_pre_push.py)
  • Full suite passes: 1314 passed in 184s
  • Ruff clean on all changed files
  • Verified resolved config for adin-chat returns expected pre-push cmds and max_iter=3
  • Post-merge manual: trigger Gate on a PR with known pre-existing test failure → expect _fail_pre_existing_tests path
  • Post-merge manual: trigger Gate on a PR where the fix introduces a test regression → expect same-iteration retry

Rollback

  • Per-repo: set pre_push_disable = true in [repos.build]. Zero-code revert.
  • Global: revert this commit. The strict_pass field on build_verify is additive and harmless to leave in.

Out of scope (follow-up)

  • Symmetric typecheck/lint strict gate. compare_builds only checks pass booleans for tc/lint (not counts) — same class of bug exists there but is caught earlier by the build_verify retry loop, so user-visible impact is smaller. Separate plan.
  • Per-test-name regression detection (current logic is count-based). Punt until evidence shows count-deltas mis-classify.

Made with Cursor

build_verify already runs test_cmds, but compare_builds accepts
test failures whose count did not increase (pre-existing tolerance).
Husky-style pre-push hooks run strict and reject any failure, so
fixes that Gate's tolerant view approved were being rejected at
git push time (PR #399 motivation).

This change adds a strict pass/fail gate between re-review pass
and _commit_and_finish that mirrors what the remote hook will face,
with hybrid classification:

- Pass: short-circuit to commit (zero extra cost — reuses cached
  build_verify result).
- Regression (count increased, missing baseline, parse failure, or
  generic-test project_type): resume Codex with the failing-test
  context and retry once within the same iteration; on continued
  failure fall through to the next iteration WITHOUT reverting so
  the partial fix is preserved.
- Pre-existing (count unchanged on a reliable parser): fail fast
  with a clear "fix manually" message — Gate does not auto-fix
  tests it did not break.

Wiring:

- build_verify gains a strict_pass field (and the raw build_result)
  alongside its tolerance-applied pass field.
- pre_push_verify reuses strict_pass when pre_push_verify_cmds is
  unset, and runs the explicit list otherwise (replacement of
  test_cmds — covers repo hooks that do more than tests, e.g.
  contract drift checks).
- MAX_ITERATIONS bumped 2 → 3 (config-driven via new
  get_fix_max_iterations) to absorb a worst-case third iteration
  when the pre-push gate trips.

Configuration (under [repos.build]):

- pre_push_strict       (default: true)
- pre_push_disable      (default: false — escape hatch)
- pre_push_verify_cmds  (default: [] — falls back to test_cmds)
- pre_push_timeout_s    (default: 600)

Strict-by-default activates only when test_cmds (or
pre_push_verify_cmds) is configured, so OSS users without tests
are unaffected. Documented in gate.toml.example.

Co-authored-by: Cursor <cursoragent@cursor.com>
@openlawbot
Copy link
Copy Markdown
Collaborator Author

Gate Review ❌

Changes requested — The PR introduces a well-structured pre-push verification gate with solid test coverage (1308/1308 passing, 0 failures), clean lint/typecheck, and no security issues. However, the Logic stage confirmed a test-validated bug on the explicit-cmds path: pre_push_verify hardcodes has_parse_failure=False when pre_push_verify_cmds is configured, causing test-runner crashes (segfault, OOM, syntax error in test setup) with unparseable output to be misclassified as pre_existing and trigger fail-fast instead of a regression retry. This is a behavioral defect introduced by this PR in new code, confirmed by a discriminating gate test with passing mutation check. Two architecture warnings flag that the new [fix_pipeline] section and [repo.build] pre-push keys are absent from format_full_config and _REPO_KEY_ORDER, making them invisible to gate init. Two informational notes cover accepted-risk design decisions: unreviewed code can be pushed after the regression retry path, and the iter-N+1 resume prompt is misleading when entered after a pre-push regression continue.

Errors

  • gate/fixer.py:494 (test confirmed) — Explicit-cmds path in pre_push_verify hardcodes has_parse_failure=False, causing test-runner crashes with unparseable output (exit_code != 0 but failed=0) to be misclassified as pre_existing → fail-fast instead of regression → retry. The default-path branch at line 425 correctly derives parse_failure from test_block.get('parse_failure', False); the bug is asymmetric between the two paths. Gate test confirmed: explicit-cmds path returns kind='pre_existing'; default path returns kind='regression' for the same scenario. Mutation check (flipping the assertion) caused the test to fail, confirming the test is discriminating.

    Fix: On the explicit-cmds path, detect parse failure and pass it to _classify_pre_push: parse_failure = (overall_exit != 0 and parsed.get('failed', 0) == 0 and parsed.get('total', 0) == 0). Alternatively, always route to regression when overall_exit != 0 and the parser yields zero failures — the simpler option that matches the safer-side bias already encoded in _classify_pre_push.

Warnings

  • gate/config.py:265 (pattern match) — New config keys '[fix_pipeline].max_iterations' and '[repo].max_fix_iterations' are read by get_fix_max_iterations but neither the keys nor the '[fix_pipeline]' section appear in setup.py's format_full_config. A freshly generated gate.toml via 'gate init' will not surface these keys, making them discoverable only from the example file.

    Fix: Add a '[fix_pipeline]' section to format_full_config emitting 'max_iterations = 3' (alongside [models]/[timeouts]/[retry]/[limits]). Add 'max_fix_iterations' to _REPO_KEY_ORDER so per-repo overrides appear in generated TOML.

  • gate/config.py:301 (pattern match) — Four new config keys under '[repo][build]' (pre_push_strict, pre_push_disable, pre_push_verify_cmds, pre_push_timeout_s) are read by get_pre_push_config but none appear in setup.py's format_full_config or format_repo_toml's _REPO_KEY_ORDER. The escape hatches for opting out of pre-push verification are invisible to users running 'gate init'.

    Fix: Extend format_repo_toml or format_full_config to emit a '[repo.build]' subsection with commented-out pre_push_strict/pre_push_disable/pre_push_verify_cmds/pre_push_timeout_s defaults alongside the other build config keys.

Notes

  • gate/fixer.py:1593 — After a pre-push regression retry succeeds, code added by Codex during _resume_fix_session is committed and pushed without re-running _run_rereview. The comment at lines 1576-1578 documents this as accepted risk. For a narrowly scoped retry this is fine; a runaway Codex retry could push unreviewed refactors.
  • gate/fixer.py:1408 — When iteration 1 passes rereview but loops to iter 2 via the pre-push regression 'continue' branch, the iter-2 _resume_fix_session prompt is built from the iter-1 fix-rereview.json — which contains pass: true and an empty issues list — producing a prompt that tells Codex 'An independent reviewer found issues with your fix. Fix them.' followed by '[]'. Codex may proceed correctly because the session carries the prior pre-push prompt in context, but operators reading the resume prompt artifact will see a misleading empty issues list.

Build Results

  • ruff: ✅ (0 warnings)
  • pytest: ✅ (1308/1314 passed)

1 error, 2 warnings, 2 notes across 6 stages (1096s, confidence: high)

@openlawbot
Copy link
Copy Markdown
Collaborator Author

Gate Review Summary

Verdict:Changes requested  ·  confidence high  ·  1096s

The PR introduces a well-structured pre-push verification gate with solid test coverage (1308/1308 passing, 0 failures), clean lint/typecheck, and no security issues. However, the Logic stage confirmed a test-validated bug on the explicit-cmds path: pre_push_verify hardcodes has_parse_failure=False when pre_push_verify_cmds is configured, causing test-runner crashes (segfault, OOM, syntax error in test setup) with unparseable output to be misclassified as pre_existing and trigger fail-fast instead of a regression retry. This is a behavioral defect introduced by this PR in new code, confirmed by a discriminating gate test with passing mutation check. Two architecture warnings flag that the new [fix_pipeline] section and [repo.build] pre-push keys are absent from format_full_config and _REPO_KEY_ORDER, making them invisible to gate init. Two informational notes cover accepted-risk design decisions: unreviewed code can be pushed after the regression retry path, and the iter-N+1 resume prompt is misleading when entered after a pre-push regression continue.

Blocking findings (1)

  • ERROR  gate/fixer.py:L494  ·  5108424334
    Explicit-cmds path in pre_push_verify hardcodes has_parse_failure=False, causing test-runner crashes with unparseable output (exit_code != 0 but failed=0) to be misclassified as pre_existing → fail-fast instead of regression → retry. The default-path branch at line 425 correctly derives parse_failure from test_block.get('parse_failure', False); the bug is asymmetric between the two paths. Gate test confirmed: explicit-cmds path returns kind='pre_existing'; default path returns kind='regression' for the same scenario. Mutation check (flipping the assertion) caused the test to fail, confirming the test is discriminating.
Non-blocking notes (4)
  • WARNING  gate/config.py:L265  ·  7e788e32e0
    New config keys '[fix_pipeline].max_iterations' and '[repo].max_fix_iterations' are read by get_fix_max_iterations but neither the keys nor the '[fix_pipeline]' section appear in setup.py's format_full_config. A freshly generated gate.toml via 'gate init' will not surface these keys, making them discoverable only from the example file.
  • WARNING  gate/config.py:L301  ·  d7fd677966
    Four new config keys under '[repo][build]' (pre_push_strict, pre_push_disable, pre_push_verify_cmds, pre_push_timeout_s) are read by get_pre_push_config but none appear in setup.py's format_full_config or format_repo_toml's _REPO_KEY_ORDER. The escape hatches for opting out of pre-push verification are invisible to users running 'gate init'.
  • INFO  gate/fixer.py:L1593  ·  1e7d9dd80e
    After a pre-push regression retry succeeds, code added by Codex during _resume_fix_session is committed and pushed without re-running _run_rereview. The comment at lines 1576-1578 documents this as accepted risk. For a narrowly scoped retry this is fine; a runaway Codex retry could push unreviewed refactors.
  • INFO  gate/fixer.py:L1408  ·  0e0ce0da1c
    When iteration 1 passes rereview but loops to iter 2 via the pre-push regression 'continue' branch, the iter-2 _resume_fix_session prompt is built from the iter-1 fix-rereview.json — which contains pass: true and an empty issues list — producing a prompt that tells Codex 'An independent reviewer found issues with your fix. Fix them.' followed by '[]'. Codex may proceed correctly because the session carries the prior pre-push prompt in context, but operators reading the resume prompt artifact will see a misleading empty issues list.

Build

typecheck ✅  ·  lint ✅  ·  tests

Stats

1 blocking  ·  4 other  ·  6 stages run


Next steps

  • Push fixes for the blocking findings above; Gate re-runs automatically on every push.
  • If a finding is wrong, comment on the PR with /gate explain <finding-id> to see Gate's reasoning, or add the gate-emergency-bypass label with a linked incident if you must merge anyway.

How to override

You want to… Add this label, or comment
Skip Gate entirely gate-skip  or  /gate skip <reason>
Re-run the review gate-rerun  or  /gate rerun
Bypass blocking findings (audited) gate-emergency-bypass  or  /gate bypass <incident-link>
Disable auto-fix for this PR gate-no-fix

Updated 2026-05-18 16:56 UTC  ·  sticky comment, replaced in-place on each review cycle.

…h path

Address Gate review of PR #35. The default path of pre_push_verify
correctly derives has_parse_failure from build_result["tests"]
["parse_failure"] (set by compile_build's synthesized-block at
builder.py:161-189). The explicit-cmds path bypasses compile_build
and called _parse_test / _parse_pytest / _parse_generic_test
directly, then hardcoded has_parse_failure=False.

The asymmetry meant a crashed test runner (segfault, OOM, syntax
error in setup) that exited non-zero with no parseable failure
count would compare as ``after.failed (0) <= baseline.failed (0)``
and route to pre_existing → fail fast. Correct behavior is to
route to regression (safer side) since we genuinely don't know
whether the original broke or Gate's edit did.

Fix: synthesize the parse_failure signal from the parser output
on the explicit-cmds path using the same heuristic compile_build
uses — non-zero exit AND zero parseable failures AND zero
parseable total. This matches the default path's behavior exactly
without coupling either to compile_build's internal block.

Adds two regression tests:

- test_unparseable_crash_routes_to_regression: crash scenario,
  baseline=0, after=parseable-zero → must be regression.
- test_explicit_cmds_pre_existing_when_counts_match: counterfactual,
  parser extracted real counts that match baseline → pre_existing
  (confirms the heuristic is precise, not over-broad).

Co-authored-by: Cursor <cursoragent@cursor.com>
@openlawbot
Copy link
Copy Markdown
Collaborator Author

Review response — commit 48dff54

Error (parse_failure asymmetry) — FIXED

Real defect, test-confirmed by Gate. The default path of pre_push_verify derives has_parse_failure from build_result["tests"]["parse_failure"] (set by compile_build's synthesized block in builder.py:161-189). The explicit-cmds path bypasses compile_build and called the parsers directly with has_parse_failure=False.

Fix synthesizes the same signal from the parser output on the explicit-cmds path: parse_failure = (exit != 0 AND failed == 0 AND total == 0). This is the heuristic Gate's review recommended and matches what compile_build already does internally, so both paths now agree without coupling either to the other's implementation.

Added two regression tests:

  • test_unparseable_crash_routes_to_regression — the discriminating case from the review (segfault, baseline=0, parser yields zero failures → must be regression, not pre_existing).
  • test_explicit_cmds_pre_existing_when_counts_match — counterfactual to confirm the heuristic is precise, not over-broad.

Warnings (gate init discoverability) — DEFERRED

Both [fix_pipeline] keys and [repo.build] sub-keys missing from format_full_config / _REPO_KEY_ORDER is a real gap, but it's pre-existing:

  • [fix_pipeline].max_wall_clock_s, senior_session_timeout_s, max_subscope_iterations are also absent from format_full_config (line 357-369 only emits [models]/[timeouts]/[retry]/[limits]).
  • [repo.build] sub-keys (typecheck_cmd, lint_cmd, test_cmd, dep_install_cmd, etc.) are similarly absent from _REPO_KEY_ORDER.

This PR doesn't make those gaps worse — the new keys are documented in gate.toml.example alongside the other build keys. Fixing this holistically (emit the whole [fix_pipeline] section + the whole [repo.build] subsection) deserves its own focused PR rather than a partial fix here that would only emit two of the four [fix_pipeline] keys and four of the dozen [repo.build] keys.

Will open a follow-up issue for gate init config completeness.

Note 1 (push without re-rereview) — ACCEPTED RISK, documented

The plan's design decision was explicit: same-iteration retry only, no second re-review on the retry path. Rationale in the plan: re-running re-review on every retry doubles quota cost on the common case for a narrow class of risk (a Codex retry going broadly off-script when given a targeted test-fix prompt). The plan calls this out as a deliberate trade-off.

Note 2 (misleading iter N+1 resume prompt) — DEFERRED

Real artifact issue. Fixing it cleanly requires cross-iteration state (e.g., persisting the pre-push error context to a file the iter 2+ prompt selector reads), which the plan explicitly avoided to keep the design simple. Codex behaves correctly in practice because session context carries the prior pre-push prompt forward; only the operator-facing fix-resume-prompt.md artifact looks misleading.

Will track as a follow-up — the right fix is probably writing a pre-push-pending.json marker on the regression-continue path that the iter 2+ feedback selector consults before falling back to fix-rereview.json.


Verification on the fix commit:

  • pytest tests/test_fixer.py tests/test_fix_pipeline_pre_push.py tests/test_fix_pipeline_config.py: 194 passed
  • ruff check: clean

@wally-tribute-labs wally-tribute-labs merged commit 8810c9b into main May 18, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants