feat(gate): strict pre-push verification before commit_and_push by openlawbot · Pull Request #35 · openlawteam/gate

openlawbot · 2026-05-18T16:37:16Z

Summary

Adds a strict pass/fail gate between re-review pass and _commit_and_finish that mirrors what a remote pre-push hook (e.g. Husky) will face at git push time. Fixes the class of bug PR #399 hit: build_verify accepted pre-existing test failures under its tolerance, then Husky's vitest run rejected the push because hooks don't have that tolerance.
Hybrid classification: regressions feed back into the same iteration with the failing-test context; pre-existing failures fail fast without consuming iterations. Worktree is preserved across iterations on regression so Codex can build on the partial fix.
Zero-cost on the happy path: build_verify now exposes strict_pass alongside pass, and pre_push_verify reuses it by default. Subprocess only fires when pre_push_verify_cmds is explicitly set (e.g. to add contract-drift checks beyond test_cmds).

Motivation

PR #399 activity log shows the exact divergence:

gate.fixer INFO Pre-existing failures accepted
gate.fixer INFO Re-review: pass
gate.github ERROR Commit/push failed: ... vitest run

Gate already had the data to detect this earlier — build_result["overall_pass"] is computed strictly before tolerance is applied at fixer.py:296-302. It was then discarded.

Design

Classification rules

condition	kind
`strict_pass == True`	`pass`
`strict_pass == False` AND (`original_build` missing OR parse failure OR project_type uses `_parse_generic_test`)	`regression` (uncertain → safer side)
`strict_pass == False` AND `after.failed > baseline.failed`	`regression`
`strict_pass == False` AND `after.failed <= baseline.failed` (reliable parser)	`pre_existing`

Iteration handling

regression → resume Codex with failing-test context, re-run pre_push_verify. If still failing, fall through to next iteration without reverting (partial fix preserved). Mirrors the existing build_verify same-iteration retry pattern.
pre_existing → fail fast with a clear message. Gate does NOT auto-fix tests it didn't break.
Iteration budget bumped from 2 → 3 (config-driven) to absorb a worst-case third iteration.

Configuration

New keys under [repos.build] (all optional, documented in gate.toml.example):

pre_push_strict (default: true)
pre_push_disable (default: false) — escape hatch
pre_push_verify_cmds (default: []) — when set, replaces test_cmds for the gate. Used for hooks that do more than tests (e.g. contract drift).
pre_push_timeout_s (default: 600)

New keys under [fix_pipeline]:

max_iterations (default: 3, with per-repo override [[repos]].max_fix_iterations)

Open-source posture

Strict-by-default activates only when test_cmds (or pre_push_verify_cmds) is configured. Repos without tests are unaffected. Repos with tests can opt out via pre_push_disable = true. Replacement-vs-additive semantics for pre_push_verify_cmds are documented explicitly so users don't silently drop coverage when overriding.

Test plan

Unit tests: 25 new tests covering classification (regression vs pre-existing vs uncertain), default cached path, explicit cmds path, disable flag, prompt builder, fail-fast path. (tests/test_fixer.py, tests/test_fix_pipeline_config.py)
Integration tests: 6 new tests covering happy path, regression-retry success, regression-retry failure (no-revert), pre-existing fail-fast, disable flag, and 3-iteration budget. (tests/test_fix_pipeline_pre_push.py)
Full suite passes: 1314 passed in 184s
Ruff clean on all changed files
Verified resolved config for adin-chat returns expected pre-push cmds and max_iter=3
Post-merge manual: trigger Gate on a PR with known pre-existing test failure → expect _fail_pre_existing_tests path
Post-merge manual: trigger Gate on a PR where the fix introduces a test regression → expect same-iteration retry

Rollback

Per-repo: set pre_push_disable = true in [repos.build]. Zero-code revert.
Global: revert this commit. The strict_pass field on build_verify is additive and harmless to leave in.

Out of scope (follow-up)

Symmetric typecheck/lint strict gate. compare_builds only checks pass booleans for tc/lint (not counts) — same class of bug exists there but is caught earlier by the build_verify retry loop, so user-visible impact is smaller. Separate plan.
Per-test-name regression detection (current logic is count-based). Punt until evidence shows count-deltas mis-classify.

Made with Cursor

build_verify already runs test_cmds, but compare_builds accepts test failures whose count did not increase (pre-existing tolerance). Husky-style pre-push hooks run strict and reject any failure, so fixes that Gate's tolerant view approved were being rejected at git push time (PR #399 motivation). This change adds a strict pass/fail gate between re-review pass and _commit_and_finish that mirrors what the remote hook will face, with hybrid classification: - Pass: short-circuit to commit (zero extra cost — reuses cached build_verify result). - Regression (count increased, missing baseline, parse failure, or generic-test project_type): resume Codex with the failing-test context and retry once within the same iteration; on continued failure fall through to the next iteration WITHOUT reverting so the partial fix is preserved. - Pre-existing (count unchanged on a reliable parser): fail fast with a clear "fix manually" message — Gate does not auto-fix tests it did not break. Wiring: - build_verify gains a strict_pass field (and the raw build_result) alongside its tolerance-applied pass field. - pre_push_verify reuses strict_pass when pre_push_verify_cmds is unset, and runs the explicit list otherwise (replacement of test_cmds — covers repo hooks that do more than tests, e.g. contract drift checks). - MAX_ITERATIONS bumped 2 → 3 (config-driven via new get_fix_max_iterations) to absorb a worst-case third iteration when the pre-push gate trips. Configuration (under [repos.build]): - pre_push_strict (default: true) - pre_push_disable (default: false — escape hatch) - pre_push_verify_cmds (default: [] — falls back to test_cmds) - pre_push_timeout_s (default: 600) Strict-by-default activates only when test_cmds (or pre_push_verify_cmds) is configured, so OSS users without tests are unaffected. Documented in gate.toml.example. Co-authored-by: Cursor <cursoragent@cursor.com>

openlawbot · 2026-05-18T16:56:41Z

Gate Review ❌

Changes requested — The PR introduces a well-structured pre-push verification gate with solid test coverage (1308/1308 passing, 0 failures), clean lint/typecheck, and no security issues. However, the Logic stage confirmed a test-validated bug on the explicit-cmds path: pre_push_verify hardcodes has_parse_failure=False when pre_push_verify_cmds is configured, causing test-runner crashes (segfault, OOM, syntax error in test setup) with unparseable output to be misclassified as pre_existing and trigger fail-fast instead of a regression retry. This is a behavioral defect introduced by this PR in new code, confirmed by a discriminating gate test with passing mutation check. Two architecture warnings flag that the new [fix_pipeline] section and [repo.build] pre-push keys are absent from format_full_config and _REPO_KEY_ORDER, making them invisible to gate init. Two informational notes cover accepted-risk design decisions: unreviewed code can be pushed after the regression retry path, and the iter-N+1 resume prompt is misleading when entered after a pre-push regression continue.

Errors

gate/fixer.py:494 (test confirmed) — Explicit-cmds path in pre_push_verify hardcodes has_parse_failure=False, causing test-runner crashes with unparseable output (exit_code != 0 but failed=0) to be misclassified as pre_existing → fail-fast instead of regression → retry. The default-path branch at line 425 correctly derives parse_failure from test_block.get('parse_failure', False); the bug is asymmetric between the two paths. Gate test confirmed: explicit-cmds path returns kind='pre_existing'; default path returns kind='regression' for the same scenario. Mutation check (flipping the assertion) caused the test to fail, confirming the test is discriminating.

Fix: On the explicit-cmds path, detect parse failure and pass it to _classify_pre_push: parse_failure = (overall_exit != 0 and parsed.get('failed', 0) == 0 and parsed.get('total', 0) == 0). Alternatively, always route to regression when overall_exit != 0 and the parser yields zero failures — the simpler option that matches the safer-side bias already encoded in _classify_pre_push.

Warnings

gate/config.py:265 (pattern match) — New config keys '[fix_pipeline].max_iterations' and '[repo].max_fix_iterations' are read by get_fix_max_iterations but neither the keys nor the '[fix_pipeline]' section appear in setup.py's format_full_config. A freshly generated gate.toml via 'gate init' will not surface these keys, making them discoverable only from the example file.

Fix: Add a '[fix_pipeline]' section to format_full_config emitting 'max_iterations = 3' (alongside [models]/[timeouts]/[retry]/[limits]). Add 'max_fix_iterations' to _REPO_KEY_ORDER so per-repo overrides appear in generated TOML.
gate/config.py:301 (pattern match) — Four new config keys under '[repo][build]' (pre_push_strict, pre_push_disable, pre_push_verify_cmds, pre_push_timeout_s) are read by get_pre_push_config but none appear in setup.py's format_full_config or format_repo_toml's _REPO_KEY_ORDER. The escape hatches for opting out of pre-push verification are invisible to users running 'gate init'.

Fix: Extend format_repo_toml or format_full_config to emit a '[repo.build]' subsection with commented-out pre_push_strict/pre_push_disable/pre_push_verify_cmds/pre_push_timeout_s defaults alongside the other build config keys.

Notes

gate/fixer.py:1593 — After a pre-push regression retry succeeds, code added by Codex during _resume_fix_session is committed and pushed without re-running _run_rereview. The comment at lines 1576-1578 documents this as accepted risk. For a narrowly scoped retry this is fine; a runaway Codex retry could push unreviewed refactors.
gate/fixer.py:1408 — When iteration 1 passes rereview but loops to iter 2 via the pre-push regression 'continue' branch, the iter-2 _resume_fix_session prompt is built from the iter-1 fix-rereview.json — which contains pass: true and an empty issues list — producing a prompt that tells Codex 'An independent reviewer found issues with your fix. Fix them.' followed by '[]'. Codex may proceed correctly because the session carries the prior pre-push prompt in context, but operators reading the resume prompt artifact will see a misleading empty issues list.

Build Results

ruff: ✅ (0 warnings)
pytest: ✅ (1308/1314 passed)

1 error, 2 warnings, 2 notes across 6 stages (1096s, confidence: high)

openlawbot · 2026-05-18T16:56:47Z

Gate Review Summary

Verdict: ❌ Changes requested · confidence high · 1096s

The PR introduces a well-structured pre-push verification gate with solid test coverage (1308/1308 passing, 0 failures), clean lint/typecheck, and no security issues. However, the Logic stage confirmed a test-validated bug on the explicit-cmds path: pre_push_verify hardcodes has_parse_failure=False when pre_push_verify_cmds is configured, causing test-runner crashes (segfault, OOM, syntax error in test setup) with unparseable output to be misclassified as pre_existing and trigger fail-fast instead of a regression retry. This is a behavioral defect introduced by this PR in new code, confirmed by a discriminating gate test with passing mutation check. Two architecture warnings flag that the new [fix_pipeline] section and [repo.build] pre-push keys are absent from format_full_config and _REPO_KEY_ORDER, making them invisible to gate init. Two informational notes cover accepted-risk design decisions: unreviewed code can be pushed after the regression retry path, and the iter-N+1 resume prompt is misleading when entered after a pre-push regression continue.

Blocking findings (1)

ERROR gate/fixer.py:L494 · 5108424334
Explicit-cmds path in pre_push_verify hardcodes has_parse_failure=False, causing test-runner crashes with unparseable output (exit_code != 0 but failed=0) to be misclassified as pre_existing → fail-fast instead of regression → retry. The default-path branch at line 425 correctly derives parse_failure from test_block.get('parse_failure', False); the bug is asymmetric between the two paths. Gate test confirmed: explicit-cmds path returns kind='pre_existing'; default path returns kind='regression' for the same scenario. Mutation check (flipping the assertion) caused the test to fail, confirming the test is discriminating.

Non-blocking notes (4)

WARNING gate/config.py:L265 · 7e788e32e0
New config keys '[fix_pipeline].max_iterations' and '[repo].max_fix_iterations' are read by get_fix_max_iterations but neither the keys nor the '[fix_pipeline]' section appear in setup.py's format_full_config. A freshly generated gate.toml via 'gate init' will not surface these keys, making them discoverable only from the example file.
WARNING gate/config.py:L301 · d7fd677966
Four new config keys under '[repo][build]' (pre_push_strict, pre_push_disable, pre_push_verify_cmds, pre_push_timeout_s) are read by get_pre_push_config but none appear in setup.py's format_full_config or format_repo_toml's _REPO_KEY_ORDER. The escape hatches for opting out of pre-push verification are invisible to users running 'gate init'.
INFO gate/fixer.py:L1593 · 1e7d9dd80e
After a pre-push regression retry succeeds, code added by Codex during _resume_fix_session is committed and pushed without re-running _run_rereview. The comment at lines 1576-1578 documents this as accepted risk. For a narrowly scoped retry this is fine; a runaway Codex retry could push unreviewed refactors.
INFO gate/fixer.py:L1408 · 0e0ce0da1c
When iteration 1 passes rereview but loops to iter 2 via the pre-push regression 'continue' branch, the iter-2 _resume_fix_session prompt is built from the iter-1 fix-rereview.json — which contains pass: true and an empty issues list — producing a prompt that tells Codex 'An independent reviewer found issues with your fix. Fix them.' followed by '[]'. Codex may proceed correctly because the session carries the prior pre-push prompt in context, but operators reading the resume prompt artifact will see a misleading empty issues list.

Build

typecheck ✅ · lint ✅ · tests ✅

Stats

1 blocking · 4 other · 6 stages run

Next steps

Push fixes for the blocking findings above; Gate re-runs automatically on every push.
If a finding is wrong, comment on the PR with /gate explain <finding-id> to see Gate's reasoning, or add the gate-emergency-bypass label with a linked incident if you must merge anyway.

How to override

You want to…	Add this label, or comment
Skip Gate entirely	`gate-skip` or `/gate skip <reason>`
Re-run the review	`gate-rerun` or `/gate rerun`
Bypass blocking findings (audited)	`gate-emergency-bypass` or `/gate bypass <incident-link>`
Disable auto-fix for this PR	`gate-no-fix`

_{Updated 2026-05-18 16:56 UTC · sticky comment, replaced in-place on each review cycle.}

…h path Address Gate review of PR #35. The default path of pre_push_verify correctly derives has_parse_failure from build_result["tests"] ["parse_failure"] (set by compile_build's synthesized-block at builder.py:161-189). The explicit-cmds path bypasses compile_build and called _parse_test / _parse_pytest / _parse_generic_test directly, then hardcoded has_parse_failure=False. The asymmetry meant a crashed test runner (segfault, OOM, syntax error in setup) that exited non-zero with no parseable failure count would compare as ``after.failed (0) <= baseline.failed (0)`` and route to pre_existing → fail fast. Correct behavior is to route to regression (safer side) since we genuinely don't know whether the original broke or Gate's edit did. Fix: synthesize the parse_failure signal from the parser output on the explicit-cmds path using the same heuristic compile_build uses — non-zero exit AND zero parseable failures AND zero parseable total. This matches the default path's behavior exactly without coupling either to compile_build's internal block. Adds two regression tests: - test_unparseable_crash_routes_to_regression: crash scenario, baseline=0, after=parseable-zero → must be regression. - test_explicit_cmds_pre_existing_when_counts_match: counterfactual, parser extracted real counts that match baseline → pre_existing (confirms the heuristic is precise, not over-broad). Co-authored-by: Cursor <cursoragent@cursor.com>

openlawbot · 2026-05-18T17:03:48Z

Review response — commit `48dff54`

Error (parse_failure asymmetry) — FIXED

Real defect, test-confirmed by Gate. The default path of pre_push_verify derives has_parse_failure from build_result["tests"]["parse_failure"] (set by compile_build's synthesized block in builder.py:161-189). The explicit-cmds path bypasses compile_build and called the parsers directly with has_parse_failure=False.

Fix synthesizes the same signal from the parser output on the explicit-cmds path: parse_failure = (exit != 0 AND failed == 0 AND total == 0). This is the heuristic Gate's review recommended and matches what compile_build already does internally, so both paths now agree without coupling either to the other's implementation.

Added two regression tests:

test_unparseable_crash_routes_to_regression — the discriminating case from the review (segfault, baseline=0, parser yields zero failures → must be regression, not pre_existing).
test_explicit_cmds_pre_existing_when_counts_match — counterfactual to confirm the heuristic is precise, not over-broad.

Warnings (gate init discoverability) — DEFERRED

Both [fix_pipeline] keys and [repo.build] sub-keys missing from format_full_config / _REPO_KEY_ORDER is a real gap, but it's pre-existing:

[fix_pipeline].max_wall_clock_s, senior_session_timeout_s, max_subscope_iterations are also absent from format_full_config (line 357-369 only emits [models]/[timeouts]/[retry]/[limits]).
[repo.build] sub-keys (typecheck_cmd, lint_cmd, test_cmd, dep_install_cmd, etc.) are similarly absent from _REPO_KEY_ORDER.

This PR doesn't make those gaps worse — the new keys are documented in gate.toml.example alongside the other build keys. Fixing this holistically (emit the whole [fix_pipeline] section + the whole [repo.build] subsection) deserves its own focused PR rather than a partial fix here that would only emit two of the four [fix_pipeline] keys and four of the dozen [repo.build] keys.

Will open a follow-up issue for gate init config completeness.

Note 1 (push without re-rereview) — ACCEPTED RISK, documented

The plan's design decision was explicit: same-iteration retry only, no second re-review on the retry path. Rationale in the plan: re-running re-review on every retry doubles quota cost on the common case for a narrow class of risk (a Codex retry going broadly off-script when given a targeted test-fix prompt). The plan calls this out as a deliberate trade-off.

Note 2 (misleading iter N+1 resume prompt) — DEFERRED

Real artifact issue. Fixing it cleanly requires cross-iteration state (e.g., persisting the pre-push error context to a file the iter 2+ prompt selector reads), which the plan explicitly avoided to keep the design simple. Codex behaves correctly in practice because session context carries the prior pre-push prompt forward; only the operator-facing fix-resume-prompt.md artifact looks misleading.

Will track as a follow-up — the right fix is probably writing a pre-push-pending.json marker on the regression-continue path that the iter 2+ feedback selector consults before falling back to fix-rereview.json.

Verification on the fix commit:

pytest tests/test_fixer.py tests/test_fix_pipeline_pre_push.py tests/test_fix_pipeline_config.py: 194 passed
ruff check: clean

wally-tribute-labs merged commit 8810c9b into main May 18, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gate): strict pre-push verification before commit_and_push#35

feat(gate): strict pre-push verification before commit_and_push#35
wally-tribute-labs merged 2 commits into
mainfrom
feat/strict-pre-push-verification

openlawbot commented May 18, 2026

Uh oh!

openlawbot commented May 18, 2026

Uh oh!

openlawbot commented May 18, 2026

Uh oh!

openlawbot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

openlawbot commented May 18, 2026

Summary

Motivation

Design

Classification rules

Iteration handling

Configuration

Open-source posture

Test plan

Rollback

Out of scope (follow-up)

Uh oh!

openlawbot commented May 18, 2026

Gate Review ❌

Errors

Warnings

Notes

Build Results

Uh oh!

openlawbot commented May 18, 2026

Gate Review Summary

Blocking findings (1)

Build

Stats

Next steps

How to override

Uh oh!

openlawbot commented May 18, 2026

Review response — commit 48dff54

Error (parse_failure asymmetry) — FIXED

Warnings (gate init discoverability) — DEFERRED

Note 1 (push without re-rereview) — ACCEPTED RISK, documented

Note 2 (misleading iter N+1 resume prompt) — DEFERRED

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Review response — commit `48dff54`