Skip to content

v0.0.7.0 feat: decision inbox v1 — trigger-based prompt + claude --resume fix#131

Merged
Fullstop000 merged 3 commits intomainfrom
decision-inbox-trace
May 1, 2026
Merged

v0.0.7.0 feat: decision inbox v1 — trigger-based prompt + claude --resume fix#131
Fullstop000 merged 3 commits intomainfrom
decision-inbox-trace

Conversation

@Fullstop000
Copy link
Copy Markdown
Owner

@Fullstop000 Fullstop000 commented May 1, 2026

Summary

Reland of decision-inbox after the dogfood revert. Two distinct bugs surfaced during the postmortem; both are fixed here.

  • Decision Inbox v1. Agents call dispatch_decision (renamed from the brand-prefixed name) when an incoming request asks for a verdict — PR review, A-vs-B implementation, config flag. Human picks in a sidebar inbox; the agent's session resumes with the picked option's body inlined and acts on it. Server-side stores rows with CAS-protected resolve and reverts on delivery failure so picks aren't silently lost. Bridge MCP tool with structural validator at the boundary.
  • Trigger-based prompt rewrite. Replaces the v0 "when you need" permissive language with mandatory triggers for verdict-shaped requests. Splits the conversational channel (send_message) from the verdict channel (dispatch_decision) instead of conflicting "your only output channel" + buried exception. Drops the "things you can act on unilaterally" loophole.
  • Claude --resume guard. Pre-existing bug: chorus passed stale session_id to claude --resume, claude hard-errored with error_during_execution and zero downstream events, surfacing in chorus as immediate reason=Natural turn end. Driver now verifies ~/.claude/projects/<encoded-cwd>/<session_id>.jsonl exists before passing --resume; falls back to a fresh session with a warn! on miss. Regression test included.
  • AGENTS.md principles. Added YAGNI and "no cheating for the goal" (hard constraint). Both surfaced as load-bearing during the dogfood revert.

Test Coverage

[+] src/store/decisions.rs               — 4/6 paths ★★★ TESTED (CAS, double-resolve, revert, list filters)
[+] src/server/handlers/decisions.rs     — 5/8 paths ★★ TESTED (round-trip, 409, 400 channel, 400 picked_key)
[+] src/agent/drivers/claude.rs          — resume guard ★★★ TESTED (missing_session_file_drops_resume_flag)
[+] src/agent/drivers/prompt.rs          — 3/3 paths ★★★ TESTED (mandatory framing, anti-loophole, prefix)
[+] src/agent/manager.rs::resume_with_prompt — surface tested via MockLifecycle; real impl branches GAP
[+] src/bridge/mod.rs::validate_decision_payload — 12 branch GAPs (boundary defense, exercised through e2e)
[+] ui/src/components/decisions/DecisionsInbox.tsx — 0/N (consistent with project's UI test convention)

Coverage: ~58%. Backend store + handlers strong (~75%); bridge validator + UI ~15%.
Coverage gate: OVERRIDDEN at 58% — gaps are the predictable shape (UI + boundary
defense), not unsafe paths. Validator failures surface as INVALID_PARAMS to the
agent. Codex (gpt-5.5 medium) independently ran cargo test on the new code and
confirmed no blocking findings.

Tests: 528 cargo + 85 vitest, all pass.

Pre-Landing Review

Codex (gpt-5.5, reasoning_effort=medium) ran as the outside voice. No blocking findings. Verdict: Merge (option M of M/H/R). The agent independently:

  • Cloned a worktree and ran cargo test decision, cargo test missing_session_file_drops_resume_flag, cargo test claude_session_file_encodes_dots_and_slashes — all passed.
  • Verified the claude_session_file encoding matches actual files in ~/.claude/projects/.
  • Confirmed the CAS via UPDATE ... WHERE status = 'open' and the revert path in handle_resolve_decision.
  • Flagged the same follow-ups already in the PR: codex/opencode resume liveness + gemini MCP TTL.

Cross-driver verification (live, real models)

All five drivers verified emitting dispatch_decision autonomously on fresh agents:

driver model result
claude sonnet ✅ emits + full resolve→resume round-trip
kimi kimi-code/kimi-for-coding ✅ end-to-end — agent received envelope, edited members.rs
codex gpt-5.4-mini, gpt-5.5 ✅ all conventions (H2 sections, evidence prefixes, key)
gemini gemini-2.5-flash ✅ emits autonomously
opencode deepseek/deepseek-chat ✅ emits with H2 sections

Plan Completion

All r7 design items shipped: decisions table + CAS resolve, resume_with_prompt lifecycle method, channel-inference contract, dispatch_decision MCP tool + bridge validator, real handlers (create/list/resolve), envelope builder, UI inbox with sidebar toggle, claude --resume guard, prompt rewrite, e2e tests. Plus the post-revert tool rename (chorus_create_decisiondispatch_decision) per pre-merge feedback.

Known follow-ups (queued in TODOS.md, not gating ship)

  1. Codex / opencode resume liveness guard — same shape as the claude fix; both runtimes silently exit Natural on stale thread/session id rather than erroring loudly.
  2. rmcp StreamableHttpService session TTL — long agent turns (>20 min) cause MCP session expiry; tool calls return Unauthorized: Session not found even with valid payloads.
  3. Coverage gaps — bridge validator branches, real resume_with_prompt Active/Asleep paths, resume-failure → revert e2e, UI component tests.

Test plan

  • cargo test — 528 tests pass (339 lib + 80 e2e + 22 store + 61 store_tests + 26 others)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --check clean
  • cd ui && npx tsc --noEmit clean, 85/85 vitest tests pass
  • Live cross-driver: 5/5 drivers emit dispatch_decision autonomously with valid payloads
  • Live full round-trip: kimi received resolution envelope and acted on it (edited members.rs per the picked option's body)
  • Codex (gpt-5.5 medium) outside-voice review: no blocking findings, recommends merge

Lineage: chorus-design-reviews — 2026-04-30 vertical-slice design (commit 3a38b22).

🤖 Generated with Claude Code

Fullstop000 and others added 3 commits May 1, 2026 12:29
Reland after the May-2026 dogfood revert. Two distinct bugs surfaced
during that postmortem; both are fixed here, then verified live.

## What this ships

- Storage: `decisions` table (CAS-protected resolve), Store methods,
  4 unit tests.
- Lifecycle: `AgentLifecycle::resume_with_prompt` + `run_channel_id`.
  Routes to `handle.prompt(...)` for live agents; falls back to
  `start_agent(init_directive=envelope)` for asleep ones. Reverts
  the row to open if delivery fails so the human's pick isn't lost.
- Bridge: `chorus_create_decision` MCP tool with structural validator
  at the boundary. Backend forwards to `/internal/agent/{id}/decisions`.
- Handlers: 3 routes — internal create with channel-inference contract
  (400 if no active-run channel), public list with status filter,
  public resolve doing CAS + envelope build + resume_with_prompt
  + revert-on-failure.
- Prompt rewrite (the core change vs v0): trigger-based mandatory
  framing instead of permissive "when you need". Critical-rules
  splits "conversation channel" (send_message) from "verdict channel"
  (chorus_create_decision) instead of conflicting "your only output
  channel" + buried exception. Drops the "things you can act on
  unilaterally" loophole entirely.
- UI: `DecisionsInbox` component with click-to-pick, recommended-
  option highlight, optional human note, collapsible context, 5s
  polling. Sidebar inbox icon toggles the view.
- Tests: 339 lib + 80 e2e (4 new round-trip cases) + 89 vitest. All
  pass. clippy --all-targets -D warnings clean. cargo fmt clean.

## Pre-existing bug fixed: claude --resume on missing session file

Diagnosis from the dogfood postmortem: chorus persists `session_id`
in `agent_sessions` and passes it to `claude --resume` on every
restart. Claude hard-errors with `error_during_execution` and zero
events when the session file is missing locally — which surfaces
in chorus as an immediate `reason=Natural` turn end. Every prior
"agent did nothing" failure was actually this, not a prompt issue.

Fix: `claude.rs` verifies the session file exists at
`~/.claude/projects/<encoded-cwd>/<session_id>.jsonl` before passing
`--resume`. Falls back to a fresh session with a `warn!` on miss.
Regression test: `missing_session_file_drops_resume_flag`.

## Live cross-driver verification (real models, real runs)

| driver   | model                         | result                          |
|----------|-------------------------------|---------------------------------|
| claude   | sonnet                        | ✅ emits + full round-trip     |
| kimi     | kimi-code/kimi-for-coding     | ✅ emits + full round-trip     |
|          |                               |    (received envelope, edited  |
|          |                               |    members.rs per picked body) |
| codex    | gpt-5.4-mini (fresh agent)    | ✅ emits with all conventions  |
| gemini   | gemini-2.5-flash              | ✅ emits autonomously          |
| opencode | deepseek/deepseek-chat        | ✅ emits with H2 sections      |

All five drivers picked up the prompt section, recognized the
PR-review trigger, and emitted properly-shaped payloads (H2 sections,
[verified · source] / [inferred] evidence prefixes, recommended_key).

## Known follow-up bugs (not feature defects, file separately)

1. Codex / opencode have the same shape of resume bug as claude:
   chorus passes a stale thread/session id to runtimes that no longer
   hold it. Codex/opencode silently exit Natural with no work; claude
   errors loudly. Fresh agents work in all three. Stage-2 fix: extend
   the file-exists guard to per-driver liveness checks.

2. Gemini's MCP HTTP session expires on long-running turns (>20 min),
   surfacing as `Unauthorized: Session not found` when chorus_create_
   decision finally fires. rmcp's StreamableHttpService session TTL
   needs raising or session re-init on expiry.

Lineage: chorus-design-reviews/explorations/2026-04-30-pr-review-vertical-slice/design.md (commit 3a38b22).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the brand prefix and switch the verb from "create" to "dispatch"
so the tool name doesn't tie the abstraction to Chorus and reads as
the action the agent is taking (dispatching a decision request to
the human, who then picks).

Live-verified: a fresh claude agent received a PR-review request,
recognized the trigger from the renamed prompt section, and emitted
`mcp__chat__dispatch_decision` with a properly-shaped 3-option
payload (headline, question, options M/H/R, recommended_key=M).

No backwards-compat shim — pre-merge rename, agents only learn the
new name from the system prompt at startup.

Tests: 339 lib + 80 e2e all pass. cargo fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION 0.0.6.0 → 0.0.7.0 for the decision-inbox v1 reland.

Also adds two principles to AGENTS.md: YAGNI and "no cheating for the
goal" (hard constraint). And queues 3 follow-ups in TODOS.md from the
v0.0.7.0 dogfood: codex/opencode resume liveness guard, rmcp HTTP
session TTL, and the validator/UI/revert-path coverage gaps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Fullstop000 Fullstop000 changed the title decision-inbox v1: trigger-based prompt + claude --resume fix v0.0.7.0 feat: decision inbox v1 — trigger-based prompt + claude --resume fix May 1, 2026
@Fullstop000 Fullstop000 merged commit 5f99fa3 into main May 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant