Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry by quangdang46 · Pull Request #313 · quangdang46/jcode

quangdang46 · 2026-05-25T15:04:23Z

Summary

Multi-agent architecture foundation for jcode adapted from CodebuffAI's design.

Phase 0 — Foundation (commits 1-3)


`AgentDefinition`	Declarative schema (id, model, tools, prompt, reasoning, output mode, etc.)
`ModelTier`	`Routine / Thinking` enum — maps to env vars + session-gateway routing
`OutputMode`	`LastMessage / AllMessages / StructuredOutput` — controls how tool results surface
`AgentRegistry`	TOML directory loader for `.jcode/agents/*.toml` with roundtrip validation
Cross-ref validation	Ensures `spawnable_agents` IDs actually exist in registry at load time
Skill MAS bridge	`MAS`-prefixed skill names invoke `jcode-skill-{name}` binaries
`sample_agents.rs`	6 integration tests (bundled + disk-loaded TOML agents)

Phase 1 — Agent TOML definitions


`basher.toml`	Routine + Minimal reasoning, bash-only leaf agent
`editor.toml`	Thinking + Medium reasoning, full edit toolkit (8 tools)

Phase 4 — Prompt utilities


`prompt_placeholders.rs`	`{{FILE_TREE}}`, `{{CURRENT_DATE}}`, `{{REMAINING_STEPS}}`, `{{KNOWLEDGE_FILES}}`, `{{GIT_CHANGES}}` substitution engine
`wrap_as_system_reminder()`	Wraps harness step prompts in `<system_reminder>` tags

Phase 5 — JBench evaluation framework


`evals/jbench/src/agent_runner.rs`	`run_agent_in_repo()` — spawns jcode subprocess, streams stdout, captures diff via `git diff HEAD`
`evals/jbench/src/judge.rs`	`judge_with_three_models()` — GPT + Gemini + Claude in parallel, median analysis + averaged scores
`evals/jbench/src/lessons.rs`	`extract_lessons()` + `append_lessons_to_file()` — lessons accumulation per agent
`evals/jbench/src/bin/jbench.rs`	CLI with `run` and `meta-analyze` implemented; `pick-commits`/`gen-evals`/`judge` as Phase stubs

Test results

jcode-agent-runtime: 49 unit + 6 integration = 55 passed, 0 failed
jcode-jbench types:  3 roundtrip tests passed
cargo check --bin jcode: OK

🤖 Generated with Claude Code

…ase 0.1+0.2) Lay the foundation for declarative agent definitions adapted from Codebuff's AgentDefinition schema, but adapted to jcode's single-OAuth provider reality: - signals.rs: existing soft-interrupt + cancellation primitives moved into a named module; root-level re-exports preserved so src/agent.rs consumers compile unchanged. - definition.rs: AgentDefinition struct (id, model_override, prefer_tier, reasoning, tool_names, spawnable_agents, prompts, output_mode, inherit_parent_system_prompt, include_message_history) with TOML round-trip + validation for id format, system_prompt vs inherit conflict, structured_output schema requirement, self-spawn, and duplicate tool/agent ids. - tier.rs: user-defined tier slot (routine/thinking) backed by the same JCODE_ROUTING_* env vars as model_routing.rs (#100). NOT a catalog — agents inherit session model when no tier is configured, so subscription users (Claude Pro / ChatGPT Plus / Gemini Advanced) see no behavior change. Pay-per-token users opt in by setting two env vars. - reasoning.rs: ReasoningEffort enum (minimal/low/medium/high). - output.rs: OutputMode enum (last_message/all_messages/structured_output). 32 unit tests pass. Full `cargo check --bin jcode` succeeds. This is Phase 0 of the multi-agent foundation — no runtime engine changes yet. Next: TOML loader for .jcode/agents/*.toml + builtin embedded agents (Phase 0.3).

…hase 0.3) Discover and load AgentDefinition files from three locations with priority order: 1. <project>/.jcode/agents/*.toml (project-local, highest) 2. ~/.jcode/agents/*.toml (user-global) 3. AgentRegistry::register_builtin (compiled-in defaults, lowest) Project-local overrides user-global overrides builtin. Re-registering a builtin after a higher-priority entry is loaded does NOT clobber the override — the priority check is symmetric in `insert`. Design choices: - Filename must match `<id>.toml` so users can find agents by id without opening every file. Mismatches are surfaced as a load error rather than silently misindexing. - Malformed/invalid files are collected as non-fatal LoadError entries so a single bad file doesn't prevent the rest of the registry from loading. `jcode doctor` (future) reads load_errors() to surface these. - AgentRegistry intentionally does NOT cross-reference `tool_names` / `spawnable_agents` — that's done at spawn time because the tool universe may be feature-gated (Phase 0.4). 41 unit tests pass (32 prior + 9 new). `cargo check --bin jcode` succeeds.

… agents (Phase 0.4-0.6) Phase 0.4 — Cross-reference validation: - ReferenceError enum (UnknownTools, UnknownSpawnableAgents) kept separate from DefinitionError because the runtime tool/agent universe isn't known at TOML-load time. - AgentDefinition::validate_tool_references<I, S>() and validate_spawn_references<I, S>() — caller passes the available name set, gets back a sorted, comma-joined list of unknowns. - 5 new tests covering the happy path, unknowns, empty lists, and deterministic alphabetical ordering of the error message. This deliberately does NOT modify src/tool/mod.rs. The whitelist check is a pure function over the agent definition + a name set; no need to refactor tool dispatch. Phase 1 will wire the actual tool registry into the spawn path. Phase 0.5 — Skill MAS (#94) bridge: - AgentRegistry::lookup_for_skill_routing(skill_agent_id) — named alias of get() that documents the integration point with the SKILL.md field. Returns None for missing references; the skill activation site decides fallback policy. - 2 tests: hit + miss. Phase 0.6 — Sample agents + integration test: - .jcode/agents/file-picker.toml — Routine tier, no message history, leaf agent. Demonstrates file-picker pattern adapted from Codebuff. - .jcode/agents/code-reviewer.toml — Thinking tier with inherit_parent_system_prompt=true to demonstrate the prompt-cache prefix-sharing trick (~90% input-token savings on cache hits). - tests/sample_agents.rs — integration test loads both files via the public AgentRegistry API and asserts shape + behavior. 4 tests. Phase 0 totals: 49 unit + 4 integration = 53 tests, all passing. `cargo check --bin jcode` succeeds (full workspace, 3m13s). Phase 0 (foundation) is now complete: - Schema: AgentDefinition + ModelTier + OutputMode + ReasoningEffort - Loader: registry with priority order (project > user > builtin) - Validation: id format, internal invariants, cross-references - Sample agents demonstrating cache-hit and tier patterns - Skill MAS (#94) integration point established Phase 1 (4 builtin agents + spawn_agents tool + cache benchmark) is the next track.

…prompt utilities, sample agents Phase 1:两名真实 TOML agent definitions (basher + editor) với full schema Phase 4: `prompt_placeholders.rs` — `{{FILE_TREE}}`, `{{CURRENT_DATE}}`, etc. Phase 4: `wrap_as_system_reminder()` in `src/agent/prompting.rs` Phase 5: `evals/jbench/` scaffold — types, judge stub, lessons stub, agent_runner stub Phase 0.6: integration tests `basher_sample_has_expected_shape` + `editor_sample_has_expected_shape` All jcode-agent-runtime tests pass (49 unit + 6 integration).

…, CLI Phase 5.3 (agent_runner): `run_agent_in_repo()` spawns jcode subprocess with prompt on stdin, streams stdout, captures trace + diff via `git diff HEAD`. Uses `timeout()` for per-run deadline. Phase 5.4 (judge): `judge_with_three_models()` runs GPT + Gemini + Claude judges in parallel via OpenAI Responses API + Anthropic Messages API. Median analysis, averaged scores. `run_single_judge()` exposes per-judge entry point. Phase 5.5 (lessons): `extract_lessons()` calls lessons extractor model via Responses API. `append_lessons_to_file()` accumulates lessons in per-agent JSON files with read-modify-write. Phase 5.6 (CLI): Full `jbench run` implemented (loads eval JSON, iterates commits, calls `run_agent_in_repo`, writes `.run.json` files). `jbench meta-analyze` aggregates results. Other subcommands print Phase stubs and exit 0. Bug fixes: - `JudgingResult: Default` impl added (needed for EvalRun init) - `OnceLock` for lazy reqwest static client (fixes const-eval restrictions) - `context` method from `anyhow::Context` imported in bin

Enhance session management, prompt navigation, and macOS shortcuts

Bugs fixed: 1. JudgingResult deserialization (jbench/types.rs) The judge prompt schema asks for camelCase fields (completionScore, codeQualityScore, overallScore) but the Rust struct used snake_case without serde rename. parse_scorecard would fail on every real judge response. Fix: add #[serde(alias = ...)] on each score field so on-disk JSON stays snake_case while LLM-returned camelCase still deserializes cleanly. 2. Anthropic judge authentication (jbench/judge.rs) run_anthropic_judge used Authorization: Bearer <key> which always 401s on the Anthropic Messages API. Fix: switch to x-api-key header (Anthropic standard). Also split JudgeConfig::api_base / api_key from new anthropic_api_base / anthropic_api_key so the Anthropic branch can target api.anthropic.com without breaking the OpenAI-compatible path. Plumbed through run_single_judge. 3. Duplicate substitute_placeholders (src/prompt_placeholders.rs) Conflicts with the existing prompt_templates::substitute_placeholders. Different semantics (fixed context vs HashMap bindings) but same name made grep / jump-to-def ambiguous. Fix: rename the new one to substitute_context_placeholders and document the relationship in the doc comment. 4. meta_analyze .run.json filter (jbench/bin/jbench.rs) path.extension() returns only the final extension ('json'), so matching against "run.json" never fired. meta-analyze would always report zero runs. Fix: match against file_name().ends_with(".run.json"). Plus: - Run cargo fmt --all to clear the Format CI job that PR #313 was failing. - Add tests parse_scorecard_accepts_camelcase_from_llm and parse_scorecard_accepts_snake_case_from_disk to lock in the wire-format contract.

…review-issues fix(agent-runtime): address PR #313 review issues (4 bugs + fmt)

…ster # Conflicts: # src/lib.rs

…tions) Synthesizes best patterns from 9 reference repos: - AgentPath tree + mailbox (codex) - Tool-based agent delegation (CC) - DAG wave parallelism (oh-my-pi) - Role-based config bundles (opencode + codex) - Team pipeline lifecycle (oh-my-claudecode) - Cost aggregation + ancestry tracking (codebuff) Covers: architecture, types, pseudocode, Rust implementation, CLI commands, config wiring, test cases, benchmarks, rollout

…ut, field caps, serde strictness - Gate agent_runner behind 'agent-runner' feature flag - Add KNOWLEDGE_FILES_MAX_CHARS = 100_000 constant with truncation - Add #[serde(deny_unknown_fields)] to AgentDefinition - Per-model timeout in judge_with_three_models (join_all with individual timeouts) - Fix integer truncation in meta_analyze_impl avg_duration - Remove stray merge conflict marker in src/lib.rs

quangdang46 · 2026-06-04T05:30:33Z

Review-swarm fixes applied in `8feff0a`

agent_runner feature gate: Gated behind agent-runner feature flag (not auto-enabled)
KNOWLEDGE_FILES cap: Added KNOWLEDGE_FILES_MAX_CHARS = 100_000 with truncation
serde strictness: #[serde(deny_unknown_fields)] on AgentDefinition
Per-model timeout: Each judge model gets its own timeout via individual join_all futures — prevents a slow model from starving the others
Integer truncation: avg_duration uses f64 division + round() instead of integer division
Merge conflict marker: Removed stray <<<<<<< HEAD from src/lib.rs

Verified: cargo check -p jcode-jbench --features agent-runner and cargo check -p jcode-agent-runtime both pass.

…lock conflict)

- Revert src/lib.rs to master (remove stale 36-module list) - Move prompt_placeholders.rs from src/ into crates/jcode-app-core/src/ - Add pub mod prompt_placeholders to jcode-app-core/src/lib.rs - Resolve Cargo.lock merge conflict (hyper/hyper-rustls versions) Build verified: cargo check --bin jcode passes. Tests: jcode-agent-runtime 55 pass, jcode-jbench 3 pass.

…son tables + roadmap feature-planning skill analysis across codebuff, codex, claude-code, opencode, oh-my-pi, oh-my-openagent, oh-my-claudecode, pi-agent-rust, oh-my-codex. Includes: - 9 per-dimension comparison tables (schema, registry, routing, lifecycle, permission, tool, eval, prompt, session) - Top 5 gaps ranked by ROI - Wire-up plan for SafetySystem + AgentDefinition.permissionMode - Phase roadmap (Phase 1 → Phase 5) - 5 actionable issues with severity and fix suggestions

Add PermissionMode enum to jcode-agent-runtime (mirrors dcg_core::Mode): - Default: rule-based classification (legacy AUTO_ALLOWED list) - AcceptEdits: file ops auto-allowed, network/spawn prompt - Plan: read-only, writes denied without prompting - DontAsk: allow-listed tools pass, never prompt - BypassPermissions: skip all evaluation - Auto: LLM-based classifier decides per call Add permission_mode: Option<PermissionMode> to AgentDefinition. When None, agent inherits session-global mode. Update sample TOML agents: - basher: accept-edits (auto-approve bash) - editor: accept-edits (auto-approve file ops) - file-picker: plan (read-only) - code-reviewer: plan (read-only) Tests: 54 unit + 6 integration = 60 passed, 0 failed. Wire-up plan: at spawn time, convert PermissionMode to dcg_core::Mode and pass to SubagentTool/SessionToolPolicy for per-agent override.

Add optional max_turns: Option<u32> field that limits the number of agentic turns an agent may execute before being stopped. Prevents runaway agents from consuming unbounded tokens/time. When None, the agent has no per-agent turn limit (session global limit still applies). Tests: 56 unit + 6 integration = 62 passed, 0 failed.

- basher: max_turns = 10 (quick shell commands) - file-picker: max_turns = 5 (find files, done fast) - code-reviewer: max_turns = 15 (review needs more context) - editor: no limit (complex edits may need many turns)

- extract_diff_from_repo: wrap sync std::process::Command in tokio::task::spawn_blocking to avoid blocking the async runtime - todo_step: use exit code 2 (not implemented) instead of 0 (success) - Fix unused variable warnings (max_turns, timeout_secs) - cfg-gate unused imports behind agent-runner feature

Add permission_mode_to_dcg() conversion from PermissionMode to dcg_core::Mode (free function due to orphan rule). Add per-session permission mode storage (SESSION_MODES) so subagents can run under a different mode than the global default: - set_session_mode(session_id, mode) - clear_session_mode(session_id) - session_mode(session_id) -> Option<Mode> Add classify_for_agent(action, agent_permission_mode) that uses the agent's mode when set, falling back to global mode otherwise. Wire SubagentTool to propagate permission_mode from agent definition to child session via set_session_mode, and clean up on completion. Tests: 4 new tests in dcg_bridge (conversion, classify_for_agent, session_mode lifecycle).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add disallowed_tools: Vec<String> denylist to AgentDefinition. Takes precedence over tool_names — useful for inheriting a broad whitelist while blocking specific dangerous tools. Fix TOML consistency: - file-picker.toml: add explicit inherit_parent_system_prompt = false - Add documentation comment explaining why Tests: 58 unit + 6 integration = 64 passed, 0 failed.

… parent-child tree ## SubagentTool wiring (task.rs) - Add AgentRegistry to SubagentTool for definition lookup - Look up AgentDefinition by subagent_type at spawn time - Apply tool_names whitelist from definition (intersected with available) - Apply disallowed_tools denylist from definition - Inject system_prompt when inherit_parent_system_prompt is false - Wire permission_mode: params override > definition > inherit session - Map OutputMode: LastMessage->Answer, AllMessages->Compact - Log max_turns for future enforcement ## Parent-child tree (session.rs) - Add children: Vec<String> to Session with serde(default) - Add add_child() method for registering child sessions - Wire SubagentTool to call parent.add_child() after spawn - Children persisted in session JSON for TUI tree visualization Backward compatible: all new fields use serde(default), AgentRegistry is Option so missing registry falls back to existing behavior.

….children tree ## Registry wiring - Thread Option<Arc<AgentRegistry>> through Registry::new() - Pass to SubagentTool for definition lookup at spawn time - Update all Registry::new() call sites (30+ files) with None ## Session parent-child tree (already committed in Phase 2) - children: Vec<String> on Session with serde(default) - add_child() method for registering child sessions - SubagentTool calls parent.add_child() after spawn - Children persisted in session JSON + journal meta 33 files changed, +436/-105 lines.

## New tools registered in Registry ### team_create Creates a team with name + description. Stores config as JSON at ~/.jcode/teams/<name>.json. Idempotent — re-creating returns existing. ### team_delete Deletes a team config file by name. ### task_create Adds a task to an existing team. Validates team exists. Uses UUID for task IDs. ### task_update Updates task status and/or owner. Partial updates supported. ### task_list Lists all tasks in a team with their status and owner. ## Files - crates/jcode-app-core/src/tool/team.rs — TeamConfig, TeamCreateTool, TeamDeleteTool - crates/jcode-app-core/src/tool/task_management.rs — TaskCreate/Update/ListTool - crates/jcode-app-core/src/tool/mod.rs — register 5 new tools Build: cargo check passes (2 pre-existing warnings).

Resolve Cargo.toml conflict: keep both evals/jbench (branch) and crates/jcode-render-core (master).

- Remove stale >>>>>>> conflict marker in skill.rs - Fix clippy: derive Default on PermissionMode instead of manual impl - Fix clippy: collapsible if-let in tier.rs - Fix clippy: doc list indentation in output.rs - cargo fmt --all

Security fixes: - H1: Add validate_team_name() to prevent path traversal in TeamConfig - H4: Reject BypassPermissions in project-local TOML agent definitions Runtime wiring: - H2: Wire shared AgentRegistry into production Registry::new sites - H3: Add classify_for_session() that checks per-session mode overrides - H5: Add max_turns enforcement in Agent turn loop - H6: Wire agent_def.resolve_model() into SubagentTool model resolution Code quality: - M4: Remove deny_unknown_fields from AgentDefinition for forward compat - M5: Align PermissionMode::parse() with serde kebab-case - M6: Gate experimental team/task tools behind JCODE_EXPERIMENTAL_TOOLS env - M7: Document parent session mutation race condition - M8: Add SessionModeGuard RAII for automatic session mode cleanup All 63 agent-runtime tests pass. cargo check clean.

quangdang46 and others added 7 commits May 25, 2026 21:44

Merge pull request #327 from quangdang46/master

c98470c

Enhance session management, prompt navigation, and macOS shortcuts

quangdang46 mentioned this pull request May 28, 2026

fix(agent-runtime): address PR #313 review issues (4 bugs + fmt) #333

Merged

5 tasks

quangdang46 and others added 4 commits May 28, 2026 12:43

Merge pull request #333 from quangdang46/devin/1779932200-fix-pr-313-…

089fb6c

…review-issues fix(agent-runtime): address PR #313 review issues (4 bugs + fmt)

Merge remote-tracking branch 'origin/master' into fix/pr-313-merge-ma…

b4805d6

…ster # Conflicts: # src/lib.rs

quangdang46 and others added 17 commits June 4, 2026 22:08

Merge master into experimental/multi-agent-foundation (resolve Cargo.…

25d3f21

…lock conflict)

chore(agents): add max_turns to sample TOML agents

6d8ecbc

- basher: max_turns = 10 (quick shell commands) - file-picker: max_turns = 5 (find files, done fast) - code-reviewer: max_turns = 15 (review needs more context) - editor: no limit (complex edits may need many turns)

docs(review): update implementation status in review document

60f805b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge origin/master into experimental/multi-agent-foundation

89d7177

Resolve Cargo.toml conflict: keep both evals/jbench (branch) and crates/jcode-render-core (master).

fix(ci): resolve conflict markers + clippy + fmt fixes

d06a417

- Remove stale >>>>>>> conflict marker in skill.rs - Fix clippy: derive Default on PermissionMode instead of manual impl - Fix clippy: collapsible if-let in tier.rs - Fix clippy: doc list indentation in output.rs - cargo fmt --all

Merge origin/master into review/pr-313 (resolve prompting.rs conflict)

fe9c69a

quangdang46 merged commit a4f8f1f into master Jun 5, 2026
4 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313

Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313
quangdang46 merged 28 commits into
masterfrom
experimental/multi-agent-foundation

quangdang46 commented May 25, 2026 •

edited

Loading

Uh oh!

quangdang46 commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quangdang46 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 0 — Foundation (commits 1-3)

Phase 1 — Agent TOML definitions

Phase 4 — Prompt utilities

Phase 5 — JBench evaluation framework

Test results

Uh oh!

quangdang46 commented Jun 4, 2026

Review-swarm fixes applied in 8feff0a

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

quangdang46 commented May 25, 2026 •

edited

Loading

Review-swarm fixes applied in `8feff0a`