feat(autonomous-review): verdict artifact channel — agents write schema-validated verdict files; wrapper resolves artifact-first; comments become render/fallback only (INV-77, closes #233) by kane-coding-agent[bot] · Pull Request #262 · zxkane/autonomous-dev-team

kane-coding-agent · 2026-06-17T14:21:00Z

Summary

Switches the review wrapper's verdict transport from comment-scraping to a wrapper-owned artifact file. Each review agent writes its verdict as schema-validated JSON (the #229 verdict-artifact.schema.json) to a per-agent path the wrapper provisions; the wrapper validates + resolves the artifact first and only falls back to today's comment detection (with a logged verdict-source=comment-fallback marker) for an agent that produced no artifact. A malformed artifact is a loud, distinct state (an operator error envelope), never a silent absent. New invariant INV-77.

Closes #233.

What changed

New lib-review-artifact.sh — _verdict_artifact_path (XDG-state per-run dir keyed on the session UUID), _classify_verdict_artifact (→ valid / malformed / absent, adapter-spec §4.3) with the dual python3-jsonschema / jq-fallback backend, _verdict_from_artifact_json, _artifact_schema_error. Atomic-rename land detection: reads the final path once (a half-written .tmp is never the read target; a post-read write is ignored — Clause VA5).
autonomous-review.sh — provision the per-agent artifact path (mkdir 0700) + export VERDICT_ARTIFACT_PATH + inject the atomic-write (tmp+rename) instruction into build_review_prompt; an artifact-first resolution loop seeds AGENT_VERDICTS from valid artifacts (the poll loop then skips them → zero comment-list API calls when all agents produce artifacts); malformed → loud feat(autonomous-dispatcher): operator error envelope — config-class failures surface on the issue, never log-only #231 error envelope naming the agent + the schema error, treated as absent for the vote (Clause V1, never a silent PASS) and routed failed-non-substantive (re-dispatchable) via the durable AGENT_VERDICT_SOURCES=artifact-malformed scan. Per-agent verdict source is tagged for every agent (artifact / artifact-malformed / comment-fallback / none) for feat(autonomous-dispatcher): baseline metrics collector — incidents, cost-per-merged-PR, quota-failure rate, TTHW (stop-rule input) #228 metrics.
lib-review-poll.sh — the verdict poll loop AND the codex stdout fallback both skip an artifact-malformed agent (Clause V1 — its machine output is untrustworthy; the loud envelope + terminal sweep is the contract).
post-verdict.sh unchanged — stays the ONLY comment poster (INV-56). The wrapper's rendered aggregate comment + the  trailer (INV-35) are format-unchanged.

Machine consumers of the comment format (load-bearing — all preserved + pinned)

Consumer	What it parses	Pinned by
dispatcher INV-03/06/07	`<!-- review-verdict: passed / failed-substantive / failed-non-substantive cause=… -->` (`emit_verdict_trailer` unchanged)	TC-VERDICT-ARTIFACT-032 / W10
dev-resume parser	wrapper-rendered FAIL aggregate still starts `Review findings:`	TC-033 / W11
comment poller (fallback)	`Review PASSED`/`Review findings:` + `Review Session:`/`Review Agent:` trailers	existing tests + comment-fallback parity (TC-031)
post-verdict.sh	the agent verdict trailer is byte-for-byte unchanged	existing test-…-verdict-via-helper (20/0)

Test Plan

Design canvas (docs/designs/verdict-artifact-channel.md)
Test cases (docs/test-cases/verdict-artifact-channel.md)
Unit: tests/unit/test-verdict-artifact.sh (56/0) — valid/malformed/absent classification, atomic-rename/dup/late-write, path provisioning, aggregation precedence (artifact > comment; conflict logged), the zero-comment-poll AC, malformed→loud-envelope + treated-absent, comment-fallback parity, timeout-veto regression pin, and the codex-fallback malformed-skip (W14)
Conformance fixtures asserting verdict.state from artifact files: claude valid-FAIL (artifact wins over a PASS-classifying stdout), codex valid-PASS, claude + codex malformed → drop (19/19)
Stub-fleet E2E tests/e2e/run-verdict-artifact-fleet-e2e.sh (9/0): 3 stub agents (valid / malformed / absent-with-comment) → aggregate, per-agent sources, loud malformed envelope, zero-comment-poll all asserted
ShellCheck clean; cli-adapters 44/0; adapter-spec-schemas 82/0
code-simplifier review passed (found + fixed a dead-assignment bug)
pr-review agent passed (found + fixed H1: codex stdout fallback now honors Clause V1; M1: documented packaged-skill jq fallback; L1: post-sweep source tagging)
CI checks pass

Docs (same PR — Pipeline Documentation Authority)

New INV-77 in invariants.md; review-agent-flow.md artifact-first section; INV-20/40/53 annotated as demoted-to-fallback.

Scope

Review side only (dev-side artifact adoption + comment-fallback removal deferred per the issue's Out of Scope). Moves the verdict CHANNEL, not the absence model.

Post-install / upgrade

This PR adds scripts/lib-review-artifact.sh; after merge + npx skills update -g, re-run install-project-hooks.sh on every onboarded project (see CLAUDE.local.md → Post-merge Step 2) or their review wrappers will crash on the missing source.

…ma-validated verdict files; wrapper resolves artifact-first; comments become a logged fallback (INV-77, closes #233) The review wrapper now resolves each fan-out agent's PASS/FAIL verdict from a typed verdict-artifact FILE (the #229 verdict-artifact.schema.json) BEFORE any comment scraping. Comment detection (INV-20/40/53 actor+window+trailer polling) survives only as an explicitly-logged fallback (verdict-source=comment-fallback) for an agent that produced no artifact — measurable by #228 metrics. - New lib-review-artifact.sh: _verdict_artifact_path (XDG-state per-run dir keyed on the session UUID) + _classify_verdict_artifact (valid/malformed/absent, §4.3) with the dual python3-jsonschema / jq-fallback backend; _verdict_from_artifact_json + _artifact_schema_error. Atomic-rename land detection (read final path once; ignore post-read writes, Clause VA5). - autonomous-review.sh: provision the per-agent artifact path (mkdir 0700) + export VERDICT_ARTIFACT_PATH + inject atomic-write (tmp+rename) instructions into build_review_prompt; artifact-first resolution loop seeds AGENT_VERDICTS from valid artifacts (poll loop then skips them → ZERO comment polls when all agents produce artifacts); malformed → loud #231 error envelope naming the agent + schema error, treated as absent for the vote (Clause V1, never a silent PASS) and routed failed-non-substantive (re-dispatchable) via the durable AGENT_VERDICT_SOURCES=artifact-malformed scan. - lib-review-poll.sh: the verdict poll loop skips already-resolved + malformed agents (artifact > comment precedence). - post-verdict.sh stays the ONLY comment poster (INV-56); the wrapper's rendered aggregate comment + the INV-35  trailer are format-unchanged → dispatcher INV-03/06/07 + dev-resume Review-findings machine consumers keep working (pinned by tests). - Docs same PR: new INV-77; review-agent-flow.md artifact-first section; INV-20/40/53 annotated as demoted-to-fallback. - Tests: tests/unit/test-verdict-artifact.sh; conformance fixtures asserting verdictState from artifact files (claude valid-FAIL, codex valid-PASS, claude malformed); stub-fleet E2E (valid/malformed/absent-with-comment). CI: new lib + e2e in shellcheck list + the e2e step. Review side only (dev-side adoption + comment-fallback removal deferred per Out of Scope). Moves the verdict CHANNEL, not the absence model. Renumbered INV-76→77 at rebase (PR #258 landed INV-76 first). ## Post-install / upgrade This PR adds scripts/lib-review-artifact.sh; after merge + `npx skills update -g`, re-run install-project-hooks.sh on every onboarded project (see CLAUDE.local.md → Post-merge Step 2) or their review wrappers will crash on the missing `source`.

…'s INV-77 3rd arg (verdict-artifact path) The poll-budget test pinned build_review_prompt as 2-arg (post-#182). INV-77 (#233) adds a 3rd arg — the per-agent VERDICT_ARTIFACT_PATH (not an E2E signal) — so the call site is now '<agent> <session-id> <artifact-path>'. Update the assertion to the new contract; the test's intent (no E2E signal arg) is preserved.

kane-review-agent

Review reached a blocking FAIL verdict — see the Review findings: comment on issue #233 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).

…1]s — true read-once, identity binding, full-schema jq fallback (INV-77, #233) Address the 3 [P1] findings on PR #262: 1. [P1] TRUE read-once: _classify_verdict_artifact cat'd the bytes into _bytes but then validated $_path (a SECOND disk read), so a rename landing between the two reads could flip valid↔malformed. Now the captured snapshot is written to a private temp file and BOTH schema validation and the identity check run against that snapshot — never a re-read of $_path (Clause VA5). +TC-038. 2. [P1] Identity binding: the artifact-first path accepted any schema-valid JSON without checking it belongs to THIS slot. _classify_verdict_artifact / _artifact_schema_error now take optional <expected-run-id> <expected-agent>; the wrapper passes AGENT_SESSION_IDS[i] + AGENT_NAMES[i], and a mismatched .runId/.agent → malformed (loud envelope, dropped) so a buggy adapter copying example JSON or another agent's identifiers cannot cast a vote. +TC-039, +stub-fleet agent-D (foreign identity). 3. [P1] jq fallback full schema: the packaged-install default (jq, since the JSON Schema lives outside the skill tree) only checked a few top-level fields, so a PASS with blockingFindings as an object, a finding without title, an unknown key, etc. were accepted as valid — suppressing the malformed envelope. _validate_verdict_artifact_jq now mirrors the schema's full shape (additionalProperties at every level, finding object shape, typed evidence sub-objects, FAIL⇔≥1-blocking). +TC-040. Docs same PR: INV-77 (Atomic+true-read-once / Identity binding / Validation backend paragraphs), design doc, test-case doc. Unit 71/0, conformance 19/19, stub-fleet E2E 11/0, ShellCheck clean. code-simplifier passed.

kane-review-agent

Review reached a blocking FAIL verdict — see the Review findings: comment on issue #233 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).

zxkane added 2 commits June 17, 2026 22:18

kane-coding-agent Bot mentioned this pull request Jun 17, 2026

feat(autonomous-review): verdict artifact channel — agents write schema-validated verdict files; wrapper owns parsing; comments become render-only #233

Open

15 tasks

kane-review-agent Bot requested changes Jun 17, 2026

View reviewed changes

kane-coding-agent Bot mentioned this pull request Jun 17, 2026

ci: two-tier test lanes — hermetic conformance always-on for forks; live agent smoke label-gated on self-hosted #238

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autonomous-review): verdict artifact channel — agents write schema-validated verdict files; wrapper resolves artifact-first; comments become render/fallback only (INV-77, closes #233)#262

feat(autonomous-review): verdict artifact channel — agents write schema-validated verdict files; wrapper resolves artifact-first; comments become render/fallback only (INV-77, closes #233)#262
kane-coding-agent[bot] wants to merge 3 commits into
mainfrom
feat/verdict-artifact-channel-233

kane-coding-agent Bot commented Jun 17, 2026

Uh oh!

kane-review-agent Bot left a comment

Uh oh!

kane-review-agent Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kane-coding-agent Bot commented Jun 17, 2026

Summary

What changed

Machine consumers of the comment format (load-bearing — all preserved + pinned)

Test Plan

Docs (same PR — Pipeline Documentation Authority)

Scope

Post-install / upgrade

Uh oh!

kane-review-agent Bot left a comment

Choose a reason for hiding this comment

Uh oh!

kane-review-agent Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant