Skip to content

feat(autonomous-review): verdict artifact channel — agents write schema-validated verdict files; wrapper resolves artifact-first; comments become render/fallback only (INV-77, closes #233)#262

Open
kane-coding-agent[bot] wants to merge 3 commits into
mainfrom
feat/verdict-artifact-channel-233
Open

Conversation

@kane-coding-agent

Copy link
Copy Markdown
Contributor

Summary

Switches the review wrapper's verdict transport from comment-scraping to a wrapper-owned artifact file. Each review agent writes its verdict as schema-validated JSON (the #229 verdict-artifact.schema.json) to a per-agent path the wrapper provisions; the wrapper validates + resolves the artifact first and only falls back to today's comment detection (with a logged verdict-source=comment-fallback marker) for an agent that produced no artifact. A malformed artifact is a loud, distinct state (an operator error envelope), never a silent absent. New invariant INV-77.

Closes #233.

What changed

  • New lib-review-artifact.sh_verdict_artifact_path (XDG-state per-run dir keyed on the session UUID), _classify_verdict_artifact (→ valid / malformed / absent, adapter-spec §4.3) with the dual python3-jsonschema / jq-fallback backend, _verdict_from_artifact_json, _artifact_schema_error. Atomic-rename land detection: reads the final path once (a half-written .tmp is never the read target; a post-read write is ignored — Clause VA5).
  • autonomous-review.sh — provision the per-agent artifact path (mkdir 0700) + export VERDICT_ARTIFACT_PATH + inject the atomic-write (tmp+rename) instruction into build_review_prompt; an artifact-first resolution loop seeds AGENT_VERDICTS from valid artifacts (the poll loop then skips them → zero comment-list API calls when all agents produce artifacts); malformed → loud feat(autonomous-dispatcher): operator error envelope — config-class failures surface on the issue, never log-only #231 error envelope naming the agent + the schema error, treated as absent for the vote (Clause V1, never a silent PASS) and routed failed-non-substantive (re-dispatchable) via the durable AGENT_VERDICT_SOURCES=artifact-malformed scan. Per-agent verdict source is tagged for every agent (artifact / artifact-malformed / comment-fallback / none) for feat(autonomous-dispatcher): baseline metrics collector — incidents, cost-per-merged-PR, quota-failure rate, TTHW (stop-rule input) #228 metrics.
  • lib-review-poll.sh — the verdict poll loop AND the codex stdout fallback both skip an artifact-malformed agent (Clause V1 — its machine output is untrustworthy; the loud envelope + terminal sweep is the contract).
  • post-verdict.sh unchanged — stays the ONLY comment poster (INV-56). The wrapper's rendered aggregate comment + the <!-- review-verdict: … --> trailer (INV-35) are format-unchanged.

Machine consumers of the comment format (load-bearing — all preserved + pinned)

Consumer What it parses Pinned by
dispatcher INV-03/06/07 <!-- review-verdict: passed / failed-substantive / failed-non-substantive cause=… --> (emit_verdict_trailer unchanged) TC-VERDICT-ARTIFACT-032 / W10
dev-resume parser wrapper-rendered FAIL aggregate still starts Review findings: TC-033 / W11
comment poller (fallback) Review PASSED/Review findings: + Review Session:/Review Agent: trailers existing tests + comment-fallback parity (TC-031)
post-verdict.sh the agent verdict trailer is byte-for-byte unchanged existing test-…-verdict-via-helper (20/0)

Test Plan

  • Design canvas (docs/designs/verdict-artifact-channel.md)
  • Test cases (docs/test-cases/verdict-artifact-channel.md)
  • Unit: tests/unit/test-verdict-artifact.sh (56/0) — valid/malformed/absent classification, atomic-rename/dup/late-write, path provisioning, aggregation precedence (artifact > comment; conflict logged), the zero-comment-poll AC, malformed→loud-envelope + treated-absent, comment-fallback parity, timeout-veto regression pin, and the codex-fallback malformed-skip (W14)
  • Conformance fixtures asserting verdict.state from artifact files: claude valid-FAIL (artifact wins over a PASS-classifying stdout), codex valid-PASS, claude + codex malformed → drop (19/19)
  • Stub-fleet E2E tests/e2e/run-verdict-artifact-fleet-e2e.sh (9/0): 3 stub agents (valid / malformed / absent-with-comment) → aggregate, per-agent sources, loud malformed envelope, zero-comment-poll all asserted
  • ShellCheck clean; cli-adapters 44/0; adapter-spec-schemas 82/0
  • code-simplifier review passed (found + fixed a dead-assignment bug)
  • pr-review agent passed (found + fixed H1: codex stdout fallback now honors Clause V1; M1: documented packaged-skill jq fallback; L1: post-sweep source tagging)
  • CI checks pass

Docs (same PR — Pipeline Documentation Authority)

  • New INV-77 in invariants.md; review-agent-flow.md artifact-first section; INV-20/40/53 annotated as demoted-to-fallback.

Scope

Review side only (dev-side artifact adoption + comment-fallback removal deferred per the issue's Out of Scope). Moves the verdict CHANNEL, not the absence model.

Post-install / upgrade

This PR adds scripts/lib-review-artifact.sh; after merge + npx skills update -g, re-run install-project-hooks.sh on every onboarded project (see CLAUDE.local.md → Post-merge Step 2) or their review wrappers will crash on the missing source.

zxkane added 2 commits June 17, 2026 22:18
…ma-validated verdict files; wrapper resolves artifact-first; comments become a logged fallback (INV-77, closes #233)

The review wrapper now resolves each fan-out agent's PASS/FAIL verdict from a
typed verdict-artifact FILE (the #229 verdict-artifact.schema.json) BEFORE any
comment scraping. Comment detection (INV-20/40/53 actor+window+trailer polling)
survives only as an explicitly-logged fallback (verdict-source=comment-fallback)
for an agent that produced no artifact — measurable by #228 metrics.

- New lib-review-artifact.sh: _verdict_artifact_path (XDG-state per-run dir keyed
  on the session UUID) + _classify_verdict_artifact (valid/malformed/absent, §4.3)
  with the dual python3-jsonschema / jq-fallback backend; _verdict_from_artifact_json
  + _artifact_schema_error. Atomic-rename land detection (read final path once;
  ignore post-read writes, Clause VA5).
- autonomous-review.sh: provision the per-agent artifact path (mkdir 0700) +
  export VERDICT_ARTIFACT_PATH + inject atomic-write (tmp+rename) instructions
  into build_review_prompt; artifact-first resolution loop seeds AGENT_VERDICTS
  from valid artifacts (poll loop then skips them → ZERO comment polls when all
  agents produce artifacts); malformed → loud #231 error envelope naming the
  agent + schema error, treated as absent for the vote (Clause V1, never a silent
  PASS) and routed failed-non-substantive (re-dispatchable) via the durable
  AGENT_VERDICT_SOURCES=artifact-malformed scan.
- lib-review-poll.sh: the verdict poll loop skips already-resolved + malformed
  agents (artifact > comment precedence).
- post-verdict.sh stays the ONLY comment poster (INV-56); the wrapper's rendered
  aggregate comment + the INV-35 <!-- review-verdict: … --> trailer are
  format-unchanged → dispatcher INV-03/06/07 + dev-resume Review-findings machine
  consumers keep working (pinned by tests).
- Docs same PR: new INV-77; review-agent-flow.md artifact-first section;
  INV-20/40/53 annotated as demoted-to-fallback.
- Tests: tests/unit/test-verdict-artifact.sh; conformance fixtures asserting
  verdictState from artifact files (claude valid-FAIL, codex valid-PASS, claude
  malformed); stub-fleet E2E (valid/malformed/absent-with-comment). CI: new lib +
  e2e in shellcheck list + the e2e step.

Review side only (dev-side adoption + comment-fallback removal deferred per Out
of Scope). Moves the verdict CHANNEL, not the absence model. Renumbered INV-76→77
at rebase (PR #258 landed INV-76 first).

## Post-install / upgrade
This PR adds scripts/lib-review-artifact.sh; after merge + `npx skills update -g`,
re-run install-project-hooks.sh on every onboarded project (see CLAUDE.local.md →
Post-merge Step 2) or their review wrappers will crash on the missing `source`.
…'s INV-77 3rd arg (verdict-artifact path)

The poll-budget test pinned build_review_prompt as 2-arg (post-#182). INV-77
(#233) adds a 3rd arg — the per-agent VERDICT_ARTIFACT_PATH (not an E2E signal) —
so the call site is now '<agent> <session-id> <artifact-path>'. Update the
assertion to the new contract; the test's intent (no E2E signal arg) is preserved.

@kane-review-agent kane-review-agent Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review reached a blocking FAIL verdict — see the Review findings: comment on issue #233 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).

…1]s — true read-once, identity binding, full-schema jq fallback (INV-77, #233)

Address the 3 [P1] findings on PR #262:

1. [P1] TRUE read-once: _classify_verdict_artifact cat'd the bytes into _bytes
   but then validated $_path (a SECOND disk read), so a rename landing between
   the two reads could flip valid↔malformed. Now the captured snapshot is written
   to a private temp file and BOTH schema validation and the identity check run
   against that snapshot — never a re-read of $_path (Clause VA5). +TC-038.

2. [P1] Identity binding: the artifact-first path accepted any schema-valid JSON
   without checking it belongs to THIS slot. _classify_verdict_artifact /
   _artifact_schema_error now take optional <expected-run-id> <expected-agent>;
   the wrapper passes AGENT_SESSION_IDS[i] + AGENT_NAMES[i], and a mismatched
   .runId/.agent → malformed (loud envelope, dropped) so a buggy adapter copying
   example JSON or another agent's identifiers cannot cast a vote. +TC-039,
   +stub-fleet agent-D (foreign identity).

3. [P1] jq fallback full schema: the packaged-install default (jq, since the
   JSON Schema lives outside the skill tree) only checked a few top-level fields,
   so a PASS with blockingFindings as an object, a finding without title, an
   unknown key, etc. were accepted as valid — suppressing the malformed envelope.
   _validate_verdict_artifact_jq now mirrors the schema's full shape
   (additionalProperties at every level, finding object shape, typed evidence
   sub-objects, FAIL⇔≥1-blocking). +TC-040.

Docs same PR: INV-77 (Atomic+true-read-once / Identity binding / Validation
backend paragraphs), design doc, test-case doc. Unit 71/0, conformance 19/19,
stub-fleet E2E 11/0, ShellCheck clean. code-simplifier passed.

@kane-review-agent kane-review-agent Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review reached a blocking FAIL verdict — see the Review findings: comment on issue #233 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(autonomous-review): verdict artifact channel — agents write schema-validated verdict files; wrapper owns parsing; comments become render-only

1 participant