v0.0.7.0 feat: decision inbox v1 — trigger-based prompt + claude --resume fix by Fullstop000 · Pull Request #131 · Fullstop000/Chorus

Fullstop000 · 2026-05-01T04:30:42Z

Summary

Reland of decision-inbox after the dogfood revert. Two distinct bugs surfaced during the postmortem; both are fixed here.

Decision Inbox v1. Agents call dispatch_decision (renamed from the brand-prefixed name) when an incoming request asks for a verdict — PR review, A-vs-B implementation, config flag. Human picks in a sidebar inbox; the agent's session resumes with the picked option's body inlined and acts on it. Server-side stores rows with CAS-protected resolve and reverts on delivery failure so picks aren't silently lost. Bridge MCP tool with structural validator at the boundary.
Trigger-based prompt rewrite. Replaces the v0 "when you need" permissive language with mandatory triggers for verdict-shaped requests. Splits the conversational channel (send_message) from the verdict channel (dispatch_decision) instead of conflicting "your only output channel" + buried exception. Drops the "things you can act on unilaterally" loophole.
Claude --resume guard. Pre-existing bug: chorus passed stale session_id to claude --resume, claude hard-errored with error_during_execution and zero downstream events, surfacing in chorus as immediate reason=Natural turn end. Driver now verifies ~/.claude/projects/<encoded-cwd>/<session_id>.jsonl exists before passing --resume; falls back to a fresh session with a warn! on miss. Regression test included.
AGENTS.md principles. Added YAGNI and "no cheating for the goal" (hard constraint). Both surfaced as load-bearing during the dogfood revert.

Test Coverage

[+] src/store/decisions.rs               — 4/6 paths ★★★ TESTED (CAS, double-resolve, revert, list filters)
[+] src/server/handlers/decisions.rs     — 5/8 paths ★★ TESTED (round-trip, 409, 400 channel, 400 picked_key)
[+] src/agent/drivers/claude.rs          — resume guard ★★★ TESTED (missing_session_file_drops_resume_flag)
[+] src/agent/drivers/prompt.rs          — 3/3 paths ★★★ TESTED (mandatory framing, anti-loophole, prefix)
[+] src/agent/manager.rs::resume_with_prompt — surface tested via MockLifecycle; real impl branches GAP
[+] src/bridge/mod.rs::validate_decision_payload — 12 branch GAPs (boundary defense, exercised through e2e)
[+] ui/src/components/decisions/DecisionsInbox.tsx — 0/N (consistent with project's UI test convention)

Coverage: ~58%. Backend store + handlers strong (~75%); bridge validator + UI ~15%.
Coverage gate: OVERRIDDEN at 58% — gaps are the predictable shape (UI + boundary
defense), not unsafe paths. Validator failures surface as INVALID_PARAMS to the
agent. Codex (gpt-5.5 medium) independently ran cargo test on the new code and
confirmed no blocking findings.

Tests: 528 cargo + 85 vitest, all pass.

Pre-Landing Review

Codex (gpt-5.5, reasoning_effort=medium) ran as the outside voice. No blocking findings. Verdict: Merge (option M of M/H/R). The agent independently:

Cloned a worktree and ran cargo test decision, cargo test missing_session_file_drops_resume_flag, cargo test claude_session_file_encodes_dots_and_slashes — all passed.
Verified the claude_session_file encoding matches actual files in ~/.claude/projects/.
Confirmed the CAS via UPDATE ... WHERE status = 'open' and the revert path in handle_resolve_decision.
Flagged the same follow-ups already in the PR: codex/opencode resume liveness + gemini MCP TTL.

Cross-driver verification (live, real models)

All five drivers verified emitting dispatch_decision autonomously on fresh agents:

driver	model	result
claude	sonnet	✅ emits + full resolve→resume round-trip
kimi	kimi-code/kimi-for-coding	✅ end-to-end — agent received envelope, edited members.rs
codex	gpt-5.4-mini, gpt-5.5	✅ all conventions (H2 sections, evidence prefixes, key)
gemini	gemini-2.5-flash	✅ emits autonomously
opencode	deepseek/deepseek-chat	✅ emits with H2 sections

Plan Completion

All r7 design items shipped: decisions table + CAS resolve, resume_with_prompt lifecycle method, channel-inference contract, dispatch_decision MCP tool + bridge validator, real handlers (create/list/resolve), envelope builder, UI inbox with sidebar toggle, claude --resume guard, prompt rewrite, e2e tests. Plus the post-revert tool rename (chorus_create_decision → dispatch_decision) per pre-merge feedback.

Known follow-ups (queued in TODOS.md, not gating ship)

Codex / opencode resume liveness guard — same shape as the claude fix; both runtimes silently exit Natural on stale thread/session id rather than erroring loudly.
rmcp StreamableHttpService session TTL — long agent turns (>20 min) cause MCP session expiry; tool calls return Unauthorized: Session not found even with valid payloads.
Coverage gaps — bridge validator branches, real resume_with_prompt Active/Asleep paths, resume-failure → revert e2e, UI component tests.

Test plan

cargo test — 528 tests pass (339 lib + 80 e2e + 22 store + 61 store_tests + 26 others)
cargo clippy --all-targets -- -D warnings clean
cargo fmt --check clean
cd ui && npx tsc --noEmit clean, 85/85 vitest tests pass
Live cross-driver: 5/5 drivers emit dispatch_decision autonomously with valid payloads
Live full round-trip: kimi received resolution envelope and acted on it (edited members.rs per the picked option's body)
Codex (gpt-5.5 medium) outside-voice review: no blocking findings, recommends merge

Lineage: chorus-design-reviews — 2026-04-30 vertical-slice design (commit 3a38b22).

🤖 Generated with Claude Code

Reland after the May-2026 dogfood revert. Two distinct bugs surfaced during that postmortem; both are fixed here, then verified live. ## What this ships - Storage: `decisions` table (CAS-protected resolve), Store methods, 4 unit tests. - Lifecycle: `AgentLifecycle::resume_with_prompt` + `run_channel_id`. Routes to `handle.prompt(...)` for live agents; falls back to `start_agent(init_directive=envelope)` for asleep ones. Reverts the row to open if delivery fails so the human's pick isn't lost. - Bridge: `chorus_create_decision` MCP tool with structural validator at the boundary. Backend forwards to `/internal/agent/{id}/decisions`. - Handlers: 3 routes — internal create with channel-inference contract (400 if no active-run channel), public list with status filter, public resolve doing CAS + envelope build + resume_with_prompt + revert-on-failure. - Prompt rewrite (the core change vs v0): trigger-based mandatory framing instead of permissive "when you need". Critical-rules splits "conversation channel" (send_message) from "verdict channel" (chorus_create_decision) instead of conflicting "your only output channel" + buried exception. Drops the "things you can act on unilaterally" loophole entirely. - UI: `DecisionsInbox` component with click-to-pick, recommended- option highlight, optional human note, collapsible context, 5s polling. Sidebar inbox icon toggles the view. - Tests: 339 lib + 80 e2e (4 new round-trip cases) + 89 vitest. All pass. clippy --all-targets -D warnings clean. cargo fmt clean. ## Pre-existing bug fixed: claude --resume on missing session file Diagnosis from the dogfood postmortem: chorus persists `session_id` in `agent_sessions` and passes it to `claude --resume` on every restart. Claude hard-errors with `error_during_execution` and zero events when the session file is missing locally — which surfaces in chorus as an immediate `reason=Natural` turn end. Every prior "agent did nothing" failure was actually this, not a prompt issue. Fix: `claude.rs` verifies the session file exists at `~/.claude/projects/<encoded-cwd>/<session_id>.jsonl` before passing `--resume`. Falls back to a fresh session with a `warn!` on miss. Regression test: `missing_session_file_drops_resume_flag`. ## Live cross-driver verification (real models, real runs) | driver | model | result | |----------|-------------------------------|---------------------------------| | claude | sonnet | ✅ emits + full round-trip | | kimi | kimi-code/kimi-for-coding | ✅ emits + full round-trip | | | | (received envelope, edited | | | | members.rs per picked body) | | codex | gpt-5.4-mini (fresh agent) | ✅ emits with all conventions | | gemini | gemini-2.5-flash | ✅ emits autonomously | | opencode | deepseek/deepseek-chat | ✅ emits with H2 sections | All five drivers picked up the prompt section, recognized the PR-review trigger, and emitted properly-shaped payloads (H2 sections, [verified · source] / [inferred] evidence prefixes, recommended_key). ## Known follow-up bugs (not feature defects, file separately) 1. Codex / opencode have the same shape of resume bug as claude: chorus passes a stale thread/session id to runtimes that no longer hold it. Codex/opencode silently exit Natural with no work; claude errors loudly. Fresh agents work in all three. Stage-2 fix: extend the file-exists guard to per-driver liveness checks. 2. Gemini's MCP HTTP session expires on long-running turns (>20 min), surfacing as `Unauthorized: Session not found` when chorus_create_ decision finally fires. rmcp's StreamableHttpService session TTL needs raising or session re-init on expiry. Lineage: chorus-design-reviews/explorations/2026-04-30-pr-review-vertical-slice/design.md (commit 3a38b22). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop the brand prefix and switch the verb from "create" to "dispatch" so the tool name doesn't tie the abstraction to Chorus and reads as the action the agent is taking (dispatching a decision request to the human, who then picks). Live-verified: a fresh claude agent received a PR-review request, recognized the trigger from the renamed prompt section, and emitted `mcp__chat__dispatch_decision` with a properly-shaped 3-option payload (headline, question, options M/H/R, recommended_key=M). No backwards-compat shim — pre-merge rename, agents only learn the new name from the system prompt at startup. Tests: 339 lib + 80 e2e all pass. cargo fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

VERSION 0.0.6.0 → 0.0.7.0 for the decision-inbox v1 reland. Also adds two principles to AGENTS.md: YAGNI and "no cheating for the goal" (hard constraint). And queues 3 follow-ups in TODOS.md from the v0.0.7.0 dogfood: codex/opencode resume liveness guard, rmcp HTTP session TTL, and the validator/UI/revert-path coverage gaps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fullstop000 and others added 3 commits May 1, 2026 12:29

Fullstop000 changed the title ~~decision-inbox v1: trigger-based prompt + claude --resume fix~~ v0.0.7.0 feat: decision inbox v1 — trigger-based prompt + claude --resume fix May 1, 2026

Fullstop000 merged commit 5f99fa3 into main May 1, 2026
3 checks passed

Fullstop000 mentioned this pull request May 2, 2026

fix(codex+observability): session-file resume guard + silent-run detection #134

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.7.0 feat: decision inbox v1 — trigger-based prompt + claude --resume fix#131

v0.0.7.0 feat: decision inbox v1 — trigger-based prompt + claude --resume fix#131
Fullstop000 merged 3 commits intomainfrom
decision-inbox-trace

Fullstop000 commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Fullstop000 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Pre-Landing Review

Cross-driver verification (live, real models)

Plan Completion

Known follow-ups (queued in TODOS.md, not gating ship)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fullstop000 commented May 1, 2026 •

edited

Loading