Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions .claude/skills/spawn-agent/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,135 @@ git -C "${GIT_ROOT}" worktree remove --force "${AGENTS_HOME}/${BRANCH}"
rm -rf "${AGENTS_HOME}/${BRANCH}"
```

## PI agents (local mlx_lm backend)

PI agents are a **separate class of agent** that use the pi.dev SDK with a
LOCAL mlx_lm.server (managed via `/iac`) as their OpenAI-compatible backend,
instead of the Anthropic cloud API. They are useful when you want agent work
without consuming Claude API credits, or when the task is well-served by a
local Gemma-class model.

### When to use a PI agent (detection)

Use a PI agent when the user says any of:
- "spawn a PI agent" / "lanza un agente PI"
- "use the local model" / "local LLM"
- "use mlx_lm" / "use the local server"
- "no Claude credits" / "without using the API"

Otherwise, default to a regular Claude agent.

### Required setup (one-time)

```bash
# 1. Build the PI image
cd <git-root>/config && make build-pi

# 2. Start the local model server (from /iac)
cd <git-root>/iac && uv sync && uv run iac server start
uv run iac server status # verify it is reachable
```

### Spawning a PI agent

PI agents do NOT need `CLAUDE_CONTAINER_OAUTH_TOKEN`. They authenticate
against the local server via `PI_BASE_URL` (default
`http://192.168.100.1:8080/v1` — the **gateway IP** of the default bridge
subnet; `host.containers.internal` is NOT implemented in Apple Container
CLI, see apple/container#346).

Preferred: use the CLI wrapper.

```bash
q pi spawn --branch pi/refactor --task "rename ambiguous helpers"
```

Equivalent Makefile invocation:

```bash
cd <git-root>/config && make spawn-pi \
BRANCH=pi/refactor TASK="rename ambiguous helpers"
```

Container name pattern: `<project>-pi-<sanitized-branch>` (note the
`-pi-` segment that distinguishes them from Claude agents).

If the user customised the bridge subnet, pass `--base-url`:

```bash
q pi spawn --branch pi/x --task "..." --base-url http://<gateway-ip>:8080/v1
```

### Memory ceiling — MAX_PI_AGENTS=1

The model + 6 GB prompt cache leaves little RAM headroom on Apple Silicon.
The Makefile enforces `MAX_PI_AGENTS=1` by default — `spawn-pi` will refuse
to launch a second PI agent while one is still running. If the user asks
for multiple PI agents in parallel, **warn them** and recommend stopping
the existing one first.

### Listing, monitoring, stopping PI agents

```bash
q pi list # only PI agents
q pi follow --branch pi/refactor # live logs
q pi status --branch pi/refactor # status.json from worktree
q pi stop --branch pi/refactor # stop the container
```

The status.json for PI agents includes `"agent_kind": "pi"`, used to filter
PI worktrees from Claude worktrees in `list-pi-agents`.

### Important — do not mix targets

- Use `spawn-pi` / `q pi spawn` for PI agents — never the regular `spawn`.
- Use `stop-pi-agent` / `q pi stop` for PI agents — never `stop-agent`.
- The two agent classes share `AGENTS_HOME` and the bridge network, but
their containers, images, and entrypoints are independent.

### Formulating tasks for PI agents

A Gemma-class local model is much more literal than Claude. Three rules
when writing the `--task` string:

1. **Use only relative paths.** `iac/main.py`, not `/workspace/iac/main.py`.
The agent already `cd`s into the worktree; absolute paths cause it to
escape the worktree and write to the main repo's working copy.
2. **Bound the scope explicitly.** End the task with
`Modify ONLY <file>. Do not create any other files.` Without this, the
model often invents extra files ("just in case" tests, READMEs, etc.).
3. **Ask for a commit verification line.** Add
`After the edit, commit and include 'git log -1 --oneline' at the end of your response.`
This gives the orchestrator a string-level handle for "did the agent
actually commit?" beyond just checking `status.json`.

`entrypoint-pi.sh` already prepends a structural preamble with rules 1–3
to **every** PI task — so even tasks crafted by hand or by a different
orchestrator inherit the discipline. Restating the rules in the user-facing
task wording still helps reinforce them with the model.

### Verifying a PI agent completed successfully

`exit_code == 0` is not enough. The model can produce a confident-sounding
final response while having made no actual changes. Always check:

```bash
q pi status --branch <branch>
# → status.json must show:
# "phase": "completed"
# "commits": N where N >= 1 (or 0 only if the task was read-only)
```

Plus, before merging:

```bash
git diff --name-only main..<branch> # files actually changed
git log -1 --oneline <branch> # commit message + sha
```

If `commits == 0` but the task asked for code changes, report this to the
user as a failure regardless of `exit_code`. Do NOT merge the empty branch.

## Apple Container CLI reference (key commands)

```
Expand Down
37 changes: 37 additions & 0 deletions .claude/skills/spawn-agent/evals/evals.json
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,43 @@
"Stops and reports to user if a conflict occurs during any merge",
"Does NOT delete original agent branches unless user explicitly asks"
]
},
{
"id": 9,
"prompt": "Spawn a PI agent (local mlx_lm backend) to investigate the auth module and propose refactors. Branch pi/auth-debug.",
"expected_output": "Claude detects this is a PI agent (uses the local mlx_lm.server), runs `make spawn-pi` (or `q pi spawn`) with BRANCH=pi/auth-debug and a feature-style task prompt. Reminds the user to ensure the local server is running via `uv run iac server status`. Does NOT use the Claude `spawn` target, does NOT pass CLAUDE_CODE_OAUTH_TOKEN.",
"files": [],
"expectations": [
"Uses the PI-specific target: `spawn-pi` (Makefile) or `q pi spawn` (CLI)",
"Container name pattern includes `-pi-` (sanitized branch under PI namespace)",
"Does NOT pass CLAUDE_CODE_OAUTH_TOKEN — PI agents authenticate against the local model",
"Mentions or checks that mlx_lm.server is running (e.g. `uv run iac server status`)",
"Warns the user about MAX_PI_AGENTS=1 if they ask for more than one PI agent at once"
]
},
{
"id": 10,
"prompt": "Show me the PI agents currently running. I want to know which ones are using the local model.",
"expected_output": "Claude lists only PI agents (containers matching `*-pi-*` or worktrees with agent_kind=pi in status.json). Uses `make list-pi-agents` or `q pi list`. Does NOT include regular Claude agents in the output.",
"files": [],
"expectations": [
"Uses `list-pi-agents` target or `q pi list` command (not the generic `list-agents`)",
"Filters by PI containers (name includes `-pi-`) or PI worktrees (agent_kind=pi)",
"Does NOT spawn a new container",
"Output clearly distinguishes PI agents from Claude agents"
]
},
{
"id": 11,
"prompt": "The pi/auth-debug PI agent finished. Stop the container and tell me what it did.",
"expected_output": "Claude stops the PI container with `make stop-pi-agent BRANCH=pi/auth-debug` (or `q pi stop --branch pi/auth-debug`), then reads the persisted status.json from the worktree to summarize phase, exit code, commits. Does NOT call `stop-agent` (the Claude target).",
"files": [],
"expectations": [
"Uses `stop-pi-agent` target (not `stop-agent`)",
"Reads `.agent/status.json` from the worktree to summarize results",
"Reports phase, exit code, and commit count",
"Does NOT attempt `container logs` on a stopped container"
]
}
]
}
143 changes: 143 additions & 0 deletions app/cli/src/container_cli/commands/pi_agents.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
"""PI agent lifecycle commands.

PI agents are an extension of the agent system that use the pi.dev SDK with
a LOCAL mlx_lm.server backend (managed via /iac) instead of the Anthropic
cloud API. They run in separate containers built from Dockerfile.pi.

Open/Closed: this module is a pure extension. The existing agents.py and
build.py are not modified — pi commands live under their own subapp.
"""

from __future__ import annotations

import json
import os
from pathlib import Path
from typing import Annotated

import typer

from container_cli.utils import find_git_root, run_make

app = typer.Typer(help="PI agent lifecycle (local mlx_lm.server backend)")


def _agents_home() -> Path:
"""Resolve AGENTS_HOME, falling back to sibling .worktrees/ directory."""
env_val = os.environ.get("AGENTS_HOME")
if env_val:
return Path(env_val)
return find_git_root().parent / ".worktrees"


@app.command()
def build(
image: Annotated[
str | None, typer.Option("--image", help="PI image tag")
] = None,
dockerfile: Annotated[
str | None, typer.Option("--dockerfile", help="Path to PI Dockerfile")
] = None,
) -> None:
"""Build the PI agent image (Ubuntu 26.04 + PI SDK)."""
vars: dict[str, str] = {}
if image:
vars["PI_IMAGE"] = image
if dockerfile:
vars["PI_DOCKERFILE"] = dockerfile
run_make("build-pi", vars)


@app.command()
def spawn(
branch: Annotated[
str, typer.Option("--branch", help="Git branch for the PI agent worktree")
],
task: Annotated[
str, typer.Option("--task", help="Task description for the PI agent")
],
cpus: Annotated[int | None, typer.Option("--cpus", help="CPU count")] = None,
memory: Annotated[
str | None, typer.Option("--memory", help="Memory limit (e.g. 3G)")
] = None,
image: Annotated[str | None, typer.Option("--image", help="PI image tag")] = None,
base_url: Annotated[
str | None,
typer.Option(
"--base-url",
help="Override the OpenAI-compatible base URL for the local LLM",
),
] = None,
model_id: Annotated[
str | None,
typer.Option(
"--model-id",
help="Override the model id served by mlx_lm.server",
),
] = None,
) -> None:
"""Spawn a detached headless PI agent (local mlx_lm.server backend).

The mlx_lm.server must be running on the host. Check with:
uv run iac server status
"""
typer.echo(
"[pi] reminder: ensure mlx_lm.server is running "
"(`uv run iac server status` from /iac)"
)
vars: dict[str, str] = {"BRANCH": branch, "TASK": task}
if cpus is not None:
vars["CPUS"] = str(cpus)
if memory:
vars["MEMORY"] = memory
if image:
vars["PI_IMAGE"] = image
if base_url:
vars["PI_BASE_URL"] = base_url
if model_id:
vars["PI_MODEL_ID"] = model_id
run_make("spawn-pi", vars)


@app.command(name="list")
def list_agents() -> None:
"""List active PI agent containers and PI worktrees."""
run_make("list-pi-agents")


@app.command()
def logs(
branch: Annotated[str, typer.Option("--branch", help="PI agent branch name")],
) -> None:
"""Show logs for a PI agent (live container or persisted log)."""
run_make("logs-pi-agent", {"BRANCH": branch})


@app.command()
def follow(
branch: Annotated[str, typer.Option("--branch", help="PI agent branch name")],
) -> None:
"""Follow live streaming logs for a PI agent."""
run_make("follow-pi-agent", {"BRANCH": branch}, tty=True)


@app.command()
def stop(
branch: Annotated[str, typer.Option("--branch", help="PI agent branch name")],
) -> None:
"""Stop a PI agent container."""
run_make("stop-pi-agent", {"BRANCH": branch})


@app.command()
def status(
branch: Annotated[str, typer.Option("--branch", help="PI agent branch name")],
) -> None:
"""Show PI agent status from persisted status.json file."""
status_file = _agents_home() / branch / ".agent" / "status.json"
if not status_file.exists():
typer.echo(f"[pi-status] No status file found for branch '{branch}'.")
typer.echo(f"[pi-status] Expected at: {status_file}")
raise typer.Exit(1)
data = json.loads(status_file.read_text())
typer.echo(json.dumps(data, indent=2))
5 changes: 4 additions & 1 deletion app/cli/src/container_cli/main.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import typer

from container_cli.commands import agents, build, network, run
from container_cli.commands import agents, build, network, pi_agents, run

app = typer.Typer(name="q", help="Container management CLI for Claude agent containers")
agents_app = agents.app
Expand All @@ -24,6 +24,9 @@
# Register agents sub-app
app.add_typer(agents_app, name="agents")

# Register PI agent sub-app (extension — local mlx_lm backend, no Claude token)
app.add_typer(pi_agents.app, name="pi")


if __name__ == "__main__":
app()
5 changes: 4 additions & 1 deletion app/cli/tests/acceptance/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,17 @@ def invocation_context(
patch("container_cli.commands.build.run_make") as m_build, \
patch("container_cli.commands.run.run_make") as m_run, \
patch("container_cli.commands.network.run_make") as m_network, \
patch("container_cli.commands.agents.find_git_root", return_value=repo):
patch("container_cli.commands.pi_agents.run_make") as m_pi, \
patch("container_cli.commands.agents.find_git_root", return_value=repo), \
patch("container_cli.commands.pi_agents.find_git_root", return_value=repo):
ctx = InvocationContext(
runner=CliRunner(),
mocks={
"agents": m_agents,
"build": m_build,
"run": m_run,
"network": m_network,
"pi": m_pi,
},
git_root=repo,
agents_home=agents_home,
Expand Down
Loading
Loading