Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313
Merged
Merged
Conversation
…ase 0.1+0.2) Lay the foundation for declarative agent definitions adapted from Codebuff's AgentDefinition schema, but adapted to jcode's single-OAuth provider reality: - signals.rs: existing soft-interrupt + cancellation primitives moved into a named module; root-level re-exports preserved so src/agent.rs consumers compile unchanged. - definition.rs: AgentDefinition struct (id, model_override, prefer_tier, reasoning, tool_names, spawnable_agents, prompts, output_mode, inherit_parent_system_prompt, include_message_history) with TOML round-trip + validation for id format, system_prompt vs inherit conflict, structured_output schema requirement, self-spawn, and duplicate tool/agent ids. - tier.rs: user-defined tier slot (routine/thinking) backed by the same JCODE_ROUTING_* env vars as model_routing.rs (#100). NOT a catalog — agents inherit session model when no tier is configured, so subscription users (Claude Pro / ChatGPT Plus / Gemini Advanced) see no behavior change. Pay-per-token users opt in by setting two env vars. - reasoning.rs: ReasoningEffort enum (minimal/low/medium/high). - output.rs: OutputMode enum (last_message/all_messages/structured_output). 32 unit tests pass. Full `cargo check --bin jcode` succeeds. This is Phase 0 of the multi-agent foundation — no runtime engine changes yet. Next: TOML loader for .jcode/agents/*.toml + builtin embedded agents (Phase 0.3).
…hase 0.3) Discover and load AgentDefinition files from three locations with priority order: 1. <project>/.jcode/agents/*.toml (project-local, highest) 2. ~/.jcode/agents/*.toml (user-global) 3. AgentRegistry::register_builtin (compiled-in defaults, lowest) Project-local overrides user-global overrides builtin. Re-registering a builtin after a higher-priority entry is loaded does NOT clobber the override — the priority check is symmetric in `insert`. Design choices: - Filename must match `<id>.toml` so users can find agents by id without opening every file. Mismatches are surfaced as a load error rather than silently misindexing. - Malformed/invalid files are collected as non-fatal LoadError entries so a single bad file doesn't prevent the rest of the registry from loading. `jcode doctor` (future) reads load_errors() to surface these. - AgentRegistry intentionally does NOT cross-reference `tool_names` / `spawnable_agents` — that's done at spawn time because the tool universe may be feature-gated (Phase 0.4). 41 unit tests pass (32 prior + 9 new). `cargo check --bin jcode` succeeds.
… agents (Phase 0.4-0.6)
Phase 0.4 — Cross-reference validation:
- ReferenceError enum (UnknownTools, UnknownSpawnableAgents) kept
separate from DefinitionError because the runtime tool/agent
universe isn't known at TOML-load time.
- AgentDefinition::validate_tool_references<I, S>() and
validate_spawn_references<I, S>() — caller passes the available
name set, gets back a sorted, comma-joined list of unknowns.
- 5 new tests covering the happy path, unknowns, empty lists,
and deterministic alphabetical ordering of the error message.
This deliberately does NOT modify src/tool/mod.rs. The whitelist
check is a pure function over the agent definition + a name set;
no need to refactor tool dispatch. Phase 1 will wire the actual
tool registry into the spawn path.
Phase 0.5 — Skill MAS (#94) bridge:
- AgentRegistry::lookup_for_skill_routing(skill_agent_id) — named
alias of get() that documents the integration point with the
SKILL.md field. Returns None for missing references; the
skill activation site decides fallback policy.
- 2 tests: hit + miss.
Phase 0.6 — Sample agents + integration test:
- .jcode/agents/file-picker.toml — Routine tier, no message history,
leaf agent. Demonstrates file-picker pattern adapted from Codebuff.
- .jcode/agents/code-reviewer.toml — Thinking tier with
inherit_parent_system_prompt=true to demonstrate the prompt-cache
prefix-sharing trick (~90% input-token savings on cache hits).
- tests/sample_agents.rs — integration test loads both files via the
public AgentRegistry API and asserts shape + behavior. 4 tests.
Phase 0 totals: 49 unit + 4 integration = 53 tests, all passing.
`cargo check --bin jcode` succeeds (full workspace, 3m13s).
Phase 0 (foundation) is now complete:
- Schema: AgentDefinition + ModelTier + OutputMode + ReasoningEffort
- Loader: registry with priority order (project > user > builtin)
- Validation: id format, internal invariants, cross-references
- Sample agents demonstrating cache-hit and tier patterns
- Skill MAS (#94) integration point established
Phase 1 (4 builtin agents + spawn_agents tool + cache benchmark) is
the next track.
…prompt utilities, sample agents
Phase 1:两名真实 TOML agent definitions (basher + editor) với full schema
Phase 4: `prompt_placeholders.rs` — `{{FILE_TREE}}`, `{{CURRENT_DATE}}`, etc.
Phase 4: `wrap_as_system_reminder()` in `src/agent/prompting.rs`
Phase 5: `evals/jbench/` scaffold — types, judge stub, lessons stub, agent_runner stub
Phase 0.6: integration tests `basher_sample_has_expected_shape` + `editor_sample_has_expected_shape`
All jcode-agent-runtime tests pass (49 unit + 6 integration).
…, CLI Phase 5.3 (agent_runner): `run_agent_in_repo()` spawns jcode subprocess with prompt on stdin, streams stdout, captures trace + diff via `git diff HEAD`. Uses `timeout()` for per-run deadline. Phase 5.4 (judge): `judge_with_three_models()` runs GPT + Gemini + Claude judges in parallel via OpenAI Responses API + Anthropic Messages API. Median analysis, averaged scores. `run_single_judge()` exposes per-judge entry point. Phase 5.5 (lessons): `extract_lessons()` calls lessons extractor model via Responses API. `append_lessons_to_file()` accumulates lessons in per-agent JSON files with read-modify-write. Phase 5.6 (CLI): Full `jbench run` implemented (loads eval JSON, iterates commits, calls `run_agent_in_repo`, writes `.run.json` files). `jbench meta-analyze` aggregates results. Other subcommands print Phase stubs and exit 0. Bug fixes: - `JudgingResult: Default` impl added (needed for EvalRun init) - `OnceLock` for lazy reqwest static client (fixes const-eval restrictions) - `context` method from `anyhow::Context` imported in bin
Enhance session management, prompt navigation, and macOS shortcuts
Bugs fixed:
1. JudgingResult deserialization (jbench/types.rs)
The judge prompt schema asks for camelCase fields
(completionScore, codeQualityScore, overallScore) but
the Rust struct used snake_case without serde rename.
parse_scorecard would fail on every real judge response.
Fix: add #[serde(alias = ...)] on each score field so
on-disk JSON stays snake_case while LLM-returned
camelCase still deserializes cleanly.
2. Anthropic judge authentication (jbench/judge.rs)
run_anthropic_judge used Authorization: Bearer <key>
which always 401s on the Anthropic Messages API.
Fix: switch to x-api-key header (Anthropic standard).
Also split JudgeConfig::api_base / api_key from new
anthropic_api_base / anthropic_api_key so the Anthropic
branch can target api.anthropic.com without breaking
the OpenAI-compatible path. Plumbed through
run_single_judge.
3. Duplicate substitute_placeholders (src/prompt_placeholders.rs)
Conflicts with the existing
prompt_templates::substitute_placeholders. Different
semantics (fixed context vs HashMap bindings) but same
name made grep / jump-to-def ambiguous.
Fix: rename the new one to
substitute_context_placeholders and document the
relationship in the doc comment.
4. meta_analyze .run.json filter (jbench/bin/jbench.rs)
path.extension() returns only the final extension
('json'), so matching against "run.json" never fired.
meta-analyze would always report zero runs.
Fix: match against file_name().ends_with(".run.json").
Plus:
- Run cargo fmt --all to clear the Format CI job that PR
#313 was failing.
- Add tests parse_scorecard_accepts_camelcase_from_llm and
parse_scorecard_accepts_snake_case_from_disk to lock in
the wire-format contract.
5 tasks
…review-issues fix(agent-runtime): address PR #313 review issues (4 bugs + fmt)
…ster # Conflicts: # src/lib.rs
…tions) Synthesizes best patterns from 9 reference repos: - AgentPath tree + mailbox (codex) - Tool-based agent delegation (CC) - DAG wave parallelism (oh-my-pi) - Role-based config bundles (opencode + codex) - Team pipeline lifecycle (oh-my-claudecode) - Cost aggregation + ancestry tracking (codebuff) Covers: architecture, types, pseudocode, Rust implementation, CLI commands, config wiring, test cases, benchmarks, rollout
…ut, field caps, serde strictness - Gate agent_runner behind 'agent-runner' feature flag - Add KNOWLEDGE_FILES_MAX_CHARS = 100_000 constant with truncation - Add #[serde(deny_unknown_fields)] to AgentDefinition - Per-model timeout in judge_with_three_models (join_all with individual timeouts) - Fix integer truncation in meta_analyze_impl avg_duration - Remove stray merge conflict marker in src/lib.rs
Owner
Author
Review-swarm fixes applied in 8feff0a
Verified: |
- Revert src/lib.rs to master (remove stale 36-module list) - Move prompt_placeholders.rs from src/ into crates/jcode-app-core/src/ - Add pub mod prompt_placeholders to jcode-app-core/src/lib.rs - Resolve Cargo.lock merge conflict (hyper/hyper-rustls versions) Build verified: cargo check --bin jcode passes. Tests: jcode-agent-runtime 55 pass, jcode-jbench 3 pass.
…son tables + roadmap feature-planning skill analysis across codebuff, codex, claude-code, opencode, oh-my-pi, oh-my-openagent, oh-my-claudecode, pi-agent-rust, oh-my-codex. Includes: - 9 per-dimension comparison tables (schema, registry, routing, lifecycle, permission, tool, eval, prompt, session) - Top 5 gaps ranked by ROI - Wire-up plan for SafetySystem + AgentDefinition.permissionMode - Phase roadmap (Phase 1 → Phase 5) - 5 actionable issues with severity and fix suggestions
Add PermissionMode enum to jcode-agent-runtime (mirrors dcg_core::Mode): - Default: rule-based classification (legacy AUTO_ALLOWED list) - AcceptEdits: file ops auto-allowed, network/spawn prompt - Plan: read-only, writes denied without prompting - DontAsk: allow-listed tools pass, never prompt - BypassPermissions: skip all evaluation - Auto: LLM-based classifier decides per call Add permission_mode: Option<PermissionMode> to AgentDefinition. When None, agent inherits session-global mode. Update sample TOML agents: - basher: accept-edits (auto-approve bash) - editor: accept-edits (auto-approve file ops) - file-picker: plan (read-only) - code-reviewer: plan (read-only) Tests: 54 unit + 6 integration = 60 passed, 0 failed. Wire-up plan: at spawn time, convert PermissionMode to dcg_core::Mode and pass to SubagentTool/SessionToolPolicy for per-agent override.
Add optional max_turns: Option<u32> field that limits the number of agentic turns an agent may execute before being stopped. Prevents runaway agents from consuming unbounded tokens/time. When None, the agent has no per-agent turn limit (session global limit still applies). Tests: 56 unit + 6 integration = 62 passed, 0 failed.
- basher: max_turns = 10 (quick shell commands) - file-picker: max_turns = 5 (find files, done fast) - code-reviewer: max_turns = 15 (review needs more context) - editor: no limit (complex edits may need many turns)
- extract_diff_from_repo: wrap sync std::process::Command in tokio::task::spawn_blocking to avoid blocking the async runtime - todo_step: use exit code 2 (not implemented) instead of 0 (success) - Fix unused variable warnings (max_turns, timeout_secs) - cfg-gate unused imports behind agent-runner feature
Add permission_mode_to_dcg() conversion from PermissionMode to dcg_core::Mode (free function due to orphan rule). Add per-session permission mode storage (SESSION_MODES) so subagents can run under a different mode than the global default: - set_session_mode(session_id, mode) - clear_session_mode(session_id) - session_mode(session_id) -> Option<Mode> Add classify_for_agent(action, agent_permission_mode) that uses the agent's mode when set, falling back to global mode otherwise. Wire SubagentTool to propagate permission_mode from agent definition to child session via set_session_mode, and clean up on completion. Tests: 4 new tests in dcg_bridge (conversion, classify_for_agent, session_mode lifecycle).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add disallowed_tools: Vec<String> denylist to AgentDefinition. Takes precedence over tool_names — useful for inheriting a broad whitelist while blocking specific dangerous tools. Fix TOML consistency: - file-picker.toml: add explicit inherit_parent_system_prompt = false - Add documentation comment explaining why Tests: 58 unit + 6 integration = 64 passed, 0 failed.
… parent-child tree ## SubagentTool wiring (task.rs) - Add AgentRegistry to SubagentTool for definition lookup - Look up AgentDefinition by subagent_type at spawn time - Apply tool_names whitelist from definition (intersected with available) - Apply disallowed_tools denylist from definition - Inject system_prompt when inherit_parent_system_prompt is false - Wire permission_mode: params override > definition > inherit session - Map OutputMode: LastMessage->Answer, AllMessages->Compact - Log max_turns for future enforcement ## Parent-child tree (session.rs) - Add children: Vec<String> to Session with serde(default) - Add add_child() method for registering child sessions - Wire SubagentTool to call parent.add_child() after spawn - Children persisted in session JSON for TUI tree visualization Backward compatible: all new fields use serde(default), AgentRegistry is Option so missing registry falls back to existing behavior.
….children tree ## Registry wiring - Thread Option<Arc<AgentRegistry>> through Registry::new() - Pass to SubagentTool for definition lookup at spawn time - Update all Registry::new() call sites (30+ files) with None ## Session parent-child tree (already committed in Phase 2) - children: Vec<String> on Session with serde(default) - add_child() method for registering child sessions - SubagentTool calls parent.add_child() after spawn - Children persisted in session JSON + journal meta 33 files changed, +436/-105 lines.
## New tools registered in Registry ### team_create Creates a team with name + description. Stores config as JSON at ~/.jcode/teams/<name>.json. Idempotent — re-creating returns existing. ### team_delete Deletes a team config file by name. ### task_create Adds a task to an existing team. Validates team exists. Uses UUID for task IDs. ### task_update Updates task status and/or owner. Partial updates supported. ### task_list Lists all tasks in a team with their status and owner. ## Files - crates/jcode-app-core/src/tool/team.rs — TeamConfig, TeamCreateTool, TeamDeleteTool - crates/jcode-app-core/src/tool/task_management.rs — TaskCreate/Update/ListTool - crates/jcode-app-core/src/tool/mod.rs — register 5 new tools Build: cargo check passes (2 pre-existing warnings).
Resolve Cargo.toml conflict: keep both evals/jbench (branch) and crates/jcode-render-core (master).
- Remove stale >>>>>>> conflict marker in skill.rs - Fix clippy: derive Default on PermissionMode instead of manual impl - Fix clippy: collapsible if-let in tier.rs - Fix clippy: doc list indentation in output.rs - cargo fmt --all
Security fixes: - H1: Add validate_team_name() to prevent path traversal in TeamConfig - H4: Reject BypassPermissions in project-local TOML agent definitions Runtime wiring: - H2: Wire shared AgentRegistry into production Registry::new sites - H3: Add classify_for_session() that checks per-session mode overrides - H5: Add max_turns enforcement in Agent turn loop - H6: Wire agent_def.resolve_model() into SubagentTool model resolution Code quality: - M4: Remove deny_unknown_fields from AgentDefinition for forward compat - M5: Align PermissionMode::parse() with serde kebab-case - M6: Gate experimental team/task tools behind JCODE_EXPERIMENTAL_TOOLS env - M7: Document parent session mutation race condition - M8: Add SessionModeGuard RAII for automatic session mode cleanup All 63 agent-runtime tests pass. cargo check clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multi-agent architecture foundation for jcode adapted from CodebuffAI's design.
Phase 0 — Foundation (commits 1-3)
AgentDefinitionModelTierRoutine / Thinkingenum — maps to env vars + session-gateway routingOutputModeLastMessage / AllMessages / StructuredOutput— controls how tool results surfaceAgentRegistry.jcode/agents/*.tomlwith roundtrip validationspawnable_agentsIDs actually exist in registry at load timeMAS-prefixed skill names invokejcode-skill-{name}binariessample_agents.rsPhase 1 — Agent TOML definitions
basher.tomleditor.tomlPhase 4 — Prompt utilities
prompt_placeholders.rs{{FILE_TREE}},{{CURRENT_DATE}},{{REMAINING_STEPS}},{{KNOWLEDGE_FILES}},{{GIT_CHANGES}}substitution enginewrap_as_system_reminder()<system_reminder>tagsPhase 5 — JBench evaluation framework
evals/jbench/src/agent_runner.rsrun_agent_in_repo()— spawns jcode subprocess, streams stdout, captures diff viagit diff HEADevals/jbench/src/judge.rsjudge_with_three_models()— GPT + Gemini + Claude in parallel, median analysis + averaged scoresevals/jbench/src/lessons.rsextract_lessons()+append_lessons_to_file()— lessons accumulation per agentevals/jbench/src/bin/jbench.rsrunandmeta-analyzeimplemented;pick-commits/gen-evals/judgeas Phase stubsTest results
🤖 Generated with Claude Code