feat(autonomous-review): verdict artifact channel — agents write schema-validated verdict files; wrapper resolves artifact-first; comments become render/fallback only (INV-77, closes #233)#262
Open
kane-coding-agent[bot] wants to merge 3 commits into
Conversation
…ma-validated verdict files; wrapper resolves artifact-first; comments become a logged fallback (INV-77, closes #233) The review wrapper now resolves each fan-out agent's PASS/FAIL verdict from a typed verdict-artifact FILE (the #229 verdict-artifact.schema.json) BEFORE any comment scraping. Comment detection (INV-20/40/53 actor+window+trailer polling) survives only as an explicitly-logged fallback (verdict-source=comment-fallback) for an agent that produced no artifact — measurable by #228 metrics. - New lib-review-artifact.sh: _verdict_artifact_path (XDG-state per-run dir keyed on the session UUID) + _classify_verdict_artifact (valid/malformed/absent, §4.3) with the dual python3-jsonschema / jq-fallback backend; _verdict_from_artifact_json + _artifact_schema_error. Atomic-rename land detection (read final path once; ignore post-read writes, Clause VA5). - autonomous-review.sh: provision the per-agent artifact path (mkdir 0700) + export VERDICT_ARTIFACT_PATH + inject atomic-write (tmp+rename) instructions into build_review_prompt; artifact-first resolution loop seeds AGENT_VERDICTS from valid artifacts (poll loop then skips them → ZERO comment polls when all agents produce artifacts); malformed → loud #231 error envelope naming the agent + schema error, treated as absent for the vote (Clause V1, never a silent PASS) and routed failed-non-substantive (re-dispatchable) via the durable AGENT_VERDICT_SOURCES=artifact-malformed scan. - lib-review-poll.sh: the verdict poll loop skips already-resolved + malformed agents (artifact > comment precedence). - post-verdict.sh stays the ONLY comment poster (INV-56); the wrapper's rendered aggregate comment + the INV-35 <!-- review-verdict: … --> trailer are format-unchanged → dispatcher INV-03/06/07 + dev-resume Review-findings machine consumers keep working (pinned by tests). - Docs same PR: new INV-77; review-agent-flow.md artifact-first section; INV-20/40/53 annotated as demoted-to-fallback. - Tests: tests/unit/test-verdict-artifact.sh; conformance fixtures asserting verdictState from artifact files (claude valid-FAIL, codex valid-PASS, claude malformed); stub-fleet E2E (valid/malformed/absent-with-comment). CI: new lib + e2e in shellcheck list + the e2e step. Review side only (dev-side adoption + comment-fallback removal deferred per Out of Scope). Moves the verdict CHANNEL, not the absence model. Renumbered INV-76→77 at rebase (PR #258 landed INV-76 first). ## Post-install / upgrade This PR adds scripts/lib-review-artifact.sh; after merge + `npx skills update -g`, re-run install-project-hooks.sh on every onboarded project (see CLAUDE.local.md → Post-merge Step 2) or their review wrappers will crash on the missing `source`.
…'s INV-77 3rd arg (verdict-artifact path) The poll-budget test pinned build_review_prompt as 2-arg (post-#182). INV-77 (#233) adds a 3rd arg — the per-agent VERDICT_ARTIFACT_PATH (not an E2E signal) — so the call site is now '<agent> <session-id> <artifact-path>'. Update the assertion to the new contract; the test's intent (no E2E signal arg) is preserved.
15 tasks
There was a problem hiding this comment.
Review reached a blocking FAIL verdict — see the Review findings: comment on issue #233 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).
…1]s — true read-once, identity binding, full-schema jq fallback (INV-77, #233) Address the 3 [P1] findings on PR #262: 1. [P1] TRUE read-once: _classify_verdict_artifact cat'd the bytes into _bytes but then validated $_path (a SECOND disk read), so a rename landing between the two reads could flip valid↔malformed. Now the captured snapshot is written to a private temp file and BOTH schema validation and the identity check run against that snapshot — never a re-read of $_path (Clause VA5). +TC-038. 2. [P1] Identity binding: the artifact-first path accepted any schema-valid JSON without checking it belongs to THIS slot. _classify_verdict_artifact / _artifact_schema_error now take optional <expected-run-id> <expected-agent>; the wrapper passes AGENT_SESSION_IDS[i] + AGENT_NAMES[i], and a mismatched .runId/.agent → malformed (loud envelope, dropped) so a buggy adapter copying example JSON or another agent's identifiers cannot cast a vote. +TC-039, +stub-fleet agent-D (foreign identity). 3. [P1] jq fallback full schema: the packaged-install default (jq, since the JSON Schema lives outside the skill tree) only checked a few top-level fields, so a PASS with blockingFindings as an object, a finding without title, an unknown key, etc. were accepted as valid — suppressing the malformed envelope. _validate_verdict_artifact_jq now mirrors the schema's full shape (additionalProperties at every level, finding object shape, typed evidence sub-objects, FAIL⇔≥1-blocking). +TC-040. Docs same PR: INV-77 (Atomic+true-read-once / Identity binding / Validation backend paragraphs), design doc, test-case doc. Unit 71/0, conformance 19/19, stub-fleet E2E 11/0, ShellCheck clean. code-simplifier passed.
There was a problem hiding this comment.
Review reached a blocking FAIL verdict — see the Review findings: comment on issue #233 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).
12 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switches the review wrapper's verdict transport from comment-scraping to a wrapper-owned artifact file. Each review agent writes its verdict as schema-validated JSON (the #229
verdict-artifact.schema.json) to a per-agent path the wrapper provisions; the wrapper validates + resolves the artifact first and only falls back to today's comment detection (with a loggedverdict-source=comment-fallbackmarker) for an agent that produced no artifact. A malformed artifact is a loud, distinct state (an operator error envelope), never a silent absent. New invariant INV-77.Closes #233.
What changed
lib-review-artifact.sh—_verdict_artifact_path(XDG-state per-run dir keyed on the session UUID),_classify_verdict_artifact(→valid/malformed/absent, adapter-spec §4.3) with the dual python3-jsonschema / jq-fallback backend,_verdict_from_artifact_json,_artifact_schema_error. Atomic-rename land detection: reads the final path once (a half-written.tmpis never the read target; a post-read write is ignored — Clause VA5).autonomous-review.sh— provision the per-agent artifact path (mkdir 0700) + exportVERDICT_ARTIFACT_PATH+ inject the atomic-write (tmp+rename) instruction intobuild_review_prompt; an artifact-first resolution loop seedsAGENT_VERDICTSfrom valid artifacts (the poll loop then skips them → zero comment-list API calls when all agents produce artifacts);malformed→ loud feat(autonomous-dispatcher): operator error envelope — config-class failures surface on the issue, never log-only #231 error envelope naming the agent + the schema error, treated as absent for the vote (Clause V1, never a silent PASS) and routedfailed-non-substantive(re-dispatchable) via the durableAGENT_VERDICT_SOURCES=artifact-malformedscan. Per-agent verdict source is tagged for every agent (artifact/artifact-malformed/comment-fallback/none) for feat(autonomous-dispatcher): baseline metrics collector — incidents, cost-per-merged-PR, quota-failure rate, TTHW (stop-rule input) #228 metrics.lib-review-poll.sh— the verdict poll loop AND the codex stdout fallback both skip anartifact-malformedagent (Clause V1 — its machine output is untrustworthy; the loud envelope + terminal sweep is the contract).post-verdict.shunchanged — stays the ONLY comment poster (INV-56). The wrapper's rendered aggregate comment + the<!-- review-verdict: … -->trailer (INV-35) are format-unchanged.Machine consumers of the comment format (load-bearing — all preserved + pinned)
<!-- review-verdict: passed / failed-substantive / failed-non-substantive cause=… -->(emit_verdict_trailerunchanged)Review findings:Review PASSED/Review findings:+Review Session:/Review Agent:trailersTest Plan
docs/designs/verdict-artifact-channel.md)docs/test-cases/verdict-artifact-channel.md)tests/unit/test-verdict-artifact.sh(56/0) — valid/malformed/absent classification, atomic-rename/dup/late-write, path provisioning, aggregation precedence (artifact > comment; conflict logged), the zero-comment-poll AC, malformed→loud-envelope + treated-absent, comment-fallback parity, timeout-veto regression pin, and the codex-fallback malformed-skip (W14)verdict.statefrom artifact files: claude valid-FAIL (artifact wins over a PASS-classifying stdout), codex valid-PASS, claude + codex malformed → drop (19/19)tests/e2e/run-verdict-artifact-fleet-e2e.sh(9/0): 3 stub agents (valid / malformed / absent-with-comment) → aggregate, per-agent sources, loud malformed envelope, zero-comment-poll all assertedcli-adapters44/0;adapter-spec-schemas82/0Docs (same PR — Pipeline Documentation Authority)
INV-77ininvariants.md;review-agent-flow.mdartifact-first section;INV-20/40/53annotated as demoted-to-fallback.Scope
Review side only (dev-side artifact adoption + comment-fallback removal deferred per the issue's Out of Scope). Moves the verdict CHANNEL, not the absence model.
Post-install / upgrade
This PR adds
scripts/lib-review-artifact.sh; after merge +npx skills update -g, re-runinstall-project-hooks.shon every onboarded project (see CLAUDE.local.md → Post-merge Step 2) or their review wrappers will crash on the missingsource.