Skip to content

feat: PI agent (local mlx_lm backend) + task-discipline quality pass#10

Merged
deimagjas merged 4 commits into
mainfrom
feat/pi-agent
May 14, 2026
Merged

feat: PI agent (local mlx_lm backend) + task-discipline quality pass#10
deimagjas merged 4 commits into
mainfrom
feat/pi-agent

Conversation

@deimagjas
Copy link
Copy Markdown
Owner

Summary

  • Adds a second agent class — PI agents. Spawns the pi-coding-agent (@earendil-works/pi-coding-agent npm) inside a hardened Ubuntu 26.04 container (Dockerfile.pi), backed by a local mlx_lm.server running on the host (managed by a new iac CLI). No Anthropic API credits consumed. Coexists with the existing Claude agents via Open/Closed: new image, new entrypoint (entrypoint-pi.sh), new Makefile section, new CLI subapp (q pi), new tests/evals/docs — without touching the existing Claude paths.
  • Task-discipline preamble (quality pass). entrypoint-pi.sh now wraps every --task with a structural preamble before invoking pi -p: enforces relative paths, narrow scope, and a mandatory git add -A && git commit && git log -1 --oneline postcondition. Verified end-to-end on pi/format-bytes-v2 (commit fe7d407, 9/9 functional cases pass, zero side-effects in the main repo).
  • Lower sampling defaults for coding tasks. iac server start defaults to temp=0.2, top_p=0.9 (was 0.9 / 0.95) since pi-coding-agent does not expose per-request sampling — the server defaults are what every agent actually uses. New --temp and --top-p flags override at start time.

Test plan

  • cd app/cli && uv run pytest -q92 passed (16 new pi_agents unit tests, 10 new pi_agents acceptance scenarios)
  • uv run ruff check src tests → clean
  • cd config && make build-pi → image claude-pi:ubuntu built successfully, pi 0.74.0 installed
  • uv run iac server start → reports temp=0.2 top_p=0.9, /v1/models reachable
  • Container reachability probe → curl http://192.168.100.1:8080/v1/models works from inside the container network (host.containers.internal is NOT supported by Apple Container CLI — gateway IP workaround used and documented)
  • End-to-end smoke test → q pi spawn --branch pi/smoke --task "reply OK" returns "OK" from local Gemma-26b in 5s
  • End-to-end elaborate test (with salvaguardas) → q pi spawn --branch pi/format-bytes-v2 --task "Add format_bytes(n) to iac/main.py..."commits=1, only iac/main.py touched on the branch, no unsolicited files in the main repo, 9/9 functional cases pass

Architecture (Open/Closed)

Component Claude agent (existing) PI agent (new)
Image claude-agent:wolfi (Chainguard) claude-pi:ubuntu (Ubuntu 26.04, kernel 7.x)
Entrypoint config/entrypoint.sh config/entrypoint-pi.sh
Backend Anthropic API (cloud) mlx_lm.server on host (local)
CLI subapp q agents … q pi …
Makefile section spawn, list-agents, stop-agent, … spawn-pi, list-pi-agents, stop-pi-agent, …
Auth CLAUDE_CONTAINER_OAUTH_TOKEN none (local)
Memory guard scales freely MAX_PI_AGENTS=1 (Gemma-26b + 6 GB cache leaves little headroom)

Notable design decisions verified during testing

  • Apple Container CLI does NOT implement host.containers.internal (apple/container#346). The PI container reaches the host server via the bridge gateway IP (192.168.100.1 for the default subnet). Documented and exposed as PI_BASE_URL.
  • pi-coding-agent does not accept per-request temperature / top_p (verified in its models.json schema and CLI flags). Sampling control therefore lives in iac — at server-start time.
  • The first format_bytes test exposed three real failure modes: absolute paths leaking out of the worktree, exit_code: 0 without an actual commit, and a high-temp model inventing extra files. All three are now structurally prevented by the preamble + low temp defaults, not just by hoping the orchestrator phrases the task well.

Files of interest

Area Files
Local model server iac/main.py, iac/pyproject.toml
Container config/Dockerfile.pi, config/entrypoint-pi.sh
Build / lifecycle config/Makefile (new PI section)
Python CLI app/cli/src/container_cli/commands/pi_agents.py, app/cli/src/container_cli/main.py
Skill (orchestrator guidance) .claude/skills/spawn-agent/SKILL.md, .claude/skills/spawn-agent/evals/evals.json (evals 9-11)
Docs docs/agents/pi-agent.md (new), docs/agents/cli.md, docs/agents/container-agent.md
Tests app/cli/tests/test_pi_agents.py (new, 16 cases), app/cli/tests/acceptance/features/pi_agents.feature (new, 10 scenarios)

Generated with Claude Code

Comment thread config/Dockerfile.pi Outdated
Comment thread config/entrypoint-pi.sh Outdated
Comment thread config/entrypoint-pi.sh
Comment thread config/Dockerfile.pi Outdated
- Dockerfile.pi: shorten the header to operator-facing info only;
  design rationale lives in docs/agents/pi-agent.md.
- Dockerfile.pi: drop the cargo-based builder stage. Ubuntu 26.04 ships
  ripgrep / fd-find / bat / eza in apt, so the multi-stage build was
  paying compile time for tools we did not need to build from source.
  Symlink fdfind→fd and batcat→bat for ergonomics. Trims build from
  ~3 min to ~50 s. Drops dust / procs / btm (they are not in apt and
  were nice-to-have only).
- entrypoint-pi.sh: collapse the redundant "starting" phase. The gap
  between it and "working" is empty in PI agents (no credential copy),
  so we go straight to "working" — one write_status, one emit_marker.
- entrypoint-pi.sh: add a comment explaining why su-exec is required:
  the entrypoint does root-only work first (chown of the host-mounted
  worktree, models.json under /home/agent), then drops to `agent` so
  `pi` runs unprivileged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@deimagjas deimagjas merged commit c31f8eb into main May 14, 2026
9 checks passed
@deimagjas deimagjas self-assigned this May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant