RoboCode

Agents for robot physical reasoning.

Work in progress.

Installation

git clone https://github.com/tomsilver/robocode.git
cd robocode
bash install.sh

System prerequisites

The following tools are not installed by install.sh / uv sync and must be set up separately:

Tool	Required for	Install
Ollama	Local model serving (Claude + Ollama, OpenCode + Ollama)	`curl -fsSL https://ollama.com/install.sh \| sh`
Claude Code CLI	`claude` backend (default)	`curl -fsSL https://claude.ai/install.sh \| bash`
OpenCode CLI	`opencode` backend (multi-provider)	`curl -fsSL https://opencode.ai/install \| bash`
vLLM	Serving models via OpenAI-compatible API	`pip install vllm` (in a separate env)
Docker	Docker sandbox (recommended for isolation)	See Docker docs

For local model serving with Ollama, pull a model after installing:

ollama pull gemma4:31b

Agent backend setup

The agentic approach supports two backends: Claude Code CLI (default) and OpenCode (for GPT, Gemini, open-source models via vLLM/Ollama, etc.).

Claude Code CLI (default)

The Claude Code CLI (claude) is the default backend. Authenticate via one of:

Subscription (free usage): claude auth login
API key: set ANTHROPIC_API_KEY in your environment

Optionally set ROBOCODE_CLAUDE_CMD to point to a specific claude binary (defaults to claude on PATH).

The model parameter in agentic.yaml accepts CLI model aliases or full model IDs. Override per-run with e.g. model=opus on the command line.

Alias	Full model ID
`sonnet`	`claude-sonnet-4-6` (latest Sonnet, default)
`opus`	`claude-opus-4-6` (latest Opus)
`haiku`	`claude-haiku-4-5-20251001` (latest Haiku)

See Anthropic models overview for the full list.

OpenCode (multi-provider)

OpenCode supports 75+ providers including OpenAI, Google, Anthropic, and local models served via Ollama or vLLM.

Install: curl -fsSL https://opencode.ai/install | bash (also pre-installed in the Docker image).

Authenticate with your provider:

# API key (set the appropriate env var for your provider)
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...

# Or use OpenCode's interactive auth
opencode providers login

Optionally set ROBOCODE_OPENCODE_CMD to point to a specific opencode binary.

Models use the provider/model format:

Model	Provider
`openai/gpt-4o`	OpenAI
`google/gemini-2.5-pro`	Google
`anthropic/claude-sonnet-4-5`	Anthropic
`ollama/qwen3.5:latest`	Ollama (local)

For local models (Ollama, vLLM), create an opencode.json config with your provider:

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": { "qwen3.5:latest": { "name": "Qwen 3.5" } }
    }
  }
}

Environments

All environments are available as Hydra configs via environment=<config_name>.

Maze (discrete)

Config	Description
`small_maze`	Small grid maze
`large_maze`	Large grid maze

2D Kinematic (continuous, kinder geom2d)

Config	Kinder ID	Difficulty
`motion2d_easy`	`kinder/Motion2D-p0-v0`	Easy (0 passages)
`motion2d_medium`	`kinder/Motion2D-p1-v0`	Medium (1 passage)
`motion2d_hard`	`kinder/Motion2D-p3-v0`	Hard (3 passages)
`obstruction2d_easy`	`kinder/Obstruction2D-o0-v0`	Easy (0 obstructions)
`obstruction2d_medium`	`kinder/Obstruction2D-o2-v0`	Medium (2 obstructions)
`obstruction2d_hard`	`kinder/Obstruction2D-o4-v0`	Hard (4 obstructions)
`clutteredretrieval2d_easy`	`kinder/ClutteredRetrieval2D-o1-v0`	Easy (1 obstruction)
`clutteredretrieval2d_medium`	`kinder/ClutteredRetrieval2D-o10-v0`	Medium (10 obstructions)
`clutteredretrieval2d_hard`	`kinder/ClutteredRetrieval2D-o25-v0`	Hard (25 obstructions)
`clutteredstorage2d_easy`	`kinder/ClutteredStorage2D-b1-v0`	Easy (1 block)
`clutteredstorage2d_medium`	`kinder/ClutteredStorage2D-b3-v0`	Medium (3 blocks)
`clutteredstorage2d_hard`	`kinder/ClutteredStorage2D-b7-v0`	Hard (7 blocks)
`stickbutton2d_easy`	`kinder/StickButton2D-b1-v0`	Easy (1 button)
`stickbutton2d_medium`	`kinder/StickButton2D-b3-v0`	Medium (3 buttons)
`stickbutton2d_hard`	`kinder/StickButton2D-b5-v0`	Hard (5 buttons)
`pushpullhook2d`	`kinder/PushPullHook2D-v0`	Single variant

3D Kinematic (continuous, kinder geom3d)

Config	Kinder ID	Difficulty
`motion3d`	`kinder/Motion3D-v0`	Single variant
`obstruction3d_easy`	`kinder/Obstruction3D-o0-v0`	Easy (0 obstructions)
`obstruction3d_medium`	`kinder/Obstruction3D-o2-v0`	Medium (2 obstructions)
`obstruction3d_hard`	`kinder/Obstruction3D-o4-v0`	Hard (4 obstructions)
`shelf3d_easy`	`kinder/KinematicShelf3D-o1-v0`	Easy (1 cube)
`shelf3d_medium`	`kinder/KinematicShelf3D-o3-v0`	Medium (3 cubes)
`shelf3d_hard`	`kinder/KinematicShelf3D-o5-v0`	Hard (5 cubes)
`transport3d_easy`	`kinder/Transport3D-o1-v0`	Easy (1 cube)
`transport3d_hard`	`kinder/Transport3D-o2-v0`	Hard (2 cubes)
`packing3d_easy`	`kinder/Packing3D-p1-v0`	Easy (1 part)
`packing3d_medium`	`kinder/Packing3D-p2-v0`	Medium (2 parts)
`packing3d_hard`	`kinder/Packing3D-p3-v0`	Hard (3 parts)

LIBERO-PRO (manipulation benchmark, optional extra)

LIBERO-PRO is a Franka tabletop manipulation benchmark (~80 task suites covering goal / spatial / object / 10-task mixes plus OOD and perturbation variants) built on MuJoCo via robosuite. It is vendored as a submodule under third-party/LIBERO-PRO/ and gated behind the optional libero extra — it is not installed by default because it pins old upstreams (robosuite==1.4.0, gym==0.25.2, robomimic==0.2.0, bddl==1.0.1) and drags in a CUDA-enabled torch.

Install (into the same venv as the rest of robocode):

sudo apt-get install -y libegl1 libgl1    # EGL/GL runtime for MuJoCo
uv sync --extra libero --all-extras --dev  # ~60 extra Python packages, several GB

First use of the libero package runs an interactive input() prompt asking where to store datasets; the test harness writes ~/.libero/config.yaml automatically. If you hit the prompt manually, answer N — the default paths are fine for env rollouts (pre-recorded demos are not required).

List available benchmark suites:

from libero import benchmark
print(list(benchmark.get_benchmark_dict().keys()))  # ~80 suites

Minimal rollout on libero_goal task 0:

from libero import benchmark
from libero.envs import OffScreenRenderEnv

task_suite = benchmark.get_benchmark_dict()["libero_goal"]()
bddl = task_suite.get_task_bddl_file_path(0)
env = OffScreenRenderEnv(bddl_file_name=bddl, camera_heights=128, camera_widths=128)
env.seed(0)
obs = env.reset()   # dict with agentview_image, robot state, per-object poses, ...
obs, reward, done, info = env.step([0.0] * 7)
env.close()

Smoke tests live at tests/environments/test_libero.py (benchmark dict + rollout); they skip cleanly if the extra isn't installed.

Note on OpenGL: LIBERO's MuJoCo needs to coexist in-process with kinder's pybullet. src/robocode/environments/kinder_geom2d_env.py and kinder_geom3d_env.py pin MUJOCO_GL=egl / PYOPENGL_PLATFORM=egl before kinder loads so PyOpenGL latches to the EGL platform — without this, later robosuite imports in the same process fail with 'NoneType' object has no attribute 'glGetError'. If you see that error, confirm libegl1 is installed.

Sandbox

The agent runs inside a Docker container (robocode-sandbox) that provides full filesystem isolation, a restricted network, and a pre-built Python environment.

Security model

Layer	Mechanism
Filesystem	Docker bind-mount: agent can only write to `/sandbox` (the run's output dir)
Network	`init-firewall.sh` whitelists API endpoints for the configured provider (Anthropic, OpenAI, Google, etc.), GitHub IPs, and telemetry; blocks everything else via iptables. Extra domains are passed via `ROBOCODE_FIREWALL_EXTRA_DOMAINS`.
Write hook	Claude backend: `PreToolUse` hook in `.claude/settings.json` double-checks Write/Edit paths stay inside `/sandbox`. OpenCode backend: `"permission": "allow"` in `opencode.json` (Docker provides the isolation).

What the agent sees

Path	Contents
`/sandbox/`	Working directory — agent writes `approach.py`, test scripts, etc. here
`/sandbox/primitives/`	Source files from `src/robocode/primitives/` (read reference)
`/robocode/.venv/bin/python`	Python 3.11 with all robocode dependencies pre-installed
`/robocode/prpl-mono/`	Third-party packages, bind-mounted read-only from the host submodule

Start docker

Mac OS

Simply open the Docker Desktop application. Look for the status indicator in the bottom-left corner of the GUI; it should say "Docker Engine Running".

Linux

sudo systemctl start docker
sudo systemctl enable docker

Building the image

Build once from the repo root (rebuild when pyproject.toml / uv.lock change; not needed for prpl-mono code changes):

bash docker/build.sh

Using the OS-level sandbox (legacy)

The original macOS Seatbelt / Linux bubblewrap sandbox is still available (use_docker: false in agentic.yaml) but has a known limitation: it restricts filesystem writes but allows reads of the entire host filesystem.

Red team the sandbox:

python integration_tests/red_team_sandbox.py           # OS-level
python integration_tests/red_team_sandbox.py --docker  # Docker

Experiments

Run an experiment:

python experiments/run_experiment.py approach=random environment=small_maze seed=0

Run a sweep over multiple seeds and environments:

python experiments/run_experiment.py -m seed=0,1,2 environment=small_maze,large_maze approach=random

Analyze results from one or more runs:

python experiments/analyze_results.py multirun/

Agentic approach

The agentic approach launches a coding agent during train(). The agent reads the environment source code, figures out the state/action space and dynamics, and writes a GeneratedApproach class that is used at evaluation time. The agent can also write and run test scripts against the real environment to verify its solution before committing.

By default the agent uses the Claude Code CLI backend and runs in the Docker sandbox (requires bash docker/build.sh once):

python experiments/run_experiment.py approach=agentic environment=motion2d_easy

To use a different backend/model, override the approach/backend config:

# GPT-4o via OpenCode
python experiments/run_experiment.py approach=agentic approach/backend=opencode_gpt4o

# Local Ollama model
python experiments/run_experiment.py approach=agentic approach/backend=opencode_ollama

# Or override individual fields
python experiments/run_experiment.py approach=agentic approach.backend.backend=opencode approach.backend.model=google/gemini-2.5-pro

Available backend presets: claude_sonnet (default), claude_opus, opencode_gpt4o, opencode_gemini, opencode_ollama.

To use the legacy OS-level sandbox instead:

python experiments/run_experiment.py approach=agentic environment=small_maze approach.use_docker=false

To skip re-generation and load a previously generated approach:

python experiments/run_experiment.py approach=agentic environment=small_maze \
    approach.load_dir=outputs/2026-02-16/16-00-41

Parallel sweeps each get their own container (named robocode-sandbox-<uuid>), so multiple runs never interfere:

python experiments/run_experiment.py -m seed=0,1,2 environment=small_maze,large_maze approach=agentic

Use the joblib launcher to run jobs in parallel locally:

python experiments/run_experiment.py -m \
    approach=agentic \
    approach.use_docker=true \
    seed=42,24,424,444,222 \
    'primitives=[]' \
    environment=motion2d_easy,obstruction2d_easy,clutteredretrieval2d_easy,clutteredstorage2d_easy,stickbutton2d_easy,pushpullhook2d \
    'hydra.sweep.dir=multirun/2026-02-23/no_primitives_5d_s42_24_424_444_222' \
    'hydra.sweep.subdir=s${seed}/${hydra:runtime.choices.environment}' \
    hydra/launcher=joblib hydra.launcher.n_jobs=4

The generated approach.py and full agent log are saved under sandbox/ in the run's output directory (e.g. outputs/2026-02-16/16-00-41/sandbox/).

Example: small_maze

On small_maze, the agent independently discovered A* pathfinding and achieved a 100% solve rate with optimal path lengths (mean 2.3 steps across 10 episodes):

{
  "mean_eval_reward": -2.3,
  "mean_eval_steps": 2.3,
  "solve_rate": 1.0,
  "num_eval_tasks": 10
}

Generated approach.py (A* pathfinding)

"""Optimal approach for MazeEnv using A* pathfinding algorithm."""

import heapq
from typing import Optional


class GeneratedApproach:
    """Optimal maze solver using A* pathfinding."""

    def __init__(self, action_space, observation_space):
        self.action_space = action_space
        self.observation_space = observation_space
        self.planned_path: Optional[list[tuple[int, int]]] = None
        self.path_index = 0

        self.UP = 0
        self.DOWN = 1
        self.LEFT = 2
        self.RIGHT = 3

        self.action_to_delta = {
            self.UP: (-1, 0),
            self.DOWN: (1, 0),
            self.LEFT: (0, -1),
            self.RIGHT: (0, 1)
        }

    def reset(self, state, info):
        self.planned_path = self._astar_search(state)
        self.path_index = 0

    def get_action(self, state):
        if self.planned_path and self.path_index < len(self.planned_path) - 1:
            next_pos = self.planned_path[self.path_index + 1]
            dr = next_pos[0] - state.agent[0]
            dc = next_pos[1] - state.agent[1]
            for action, (delta_r, delta_c) in self.action_to_delta.items():
                if (dr, dc) == (delta_r, delta_c):
                    self.path_index += 1
                    return action
        return self._greedy_action(state)

    def _astar_search(self, state) -> Optional[list[tuple[int, int]]]:
        start, goal = state.agent, state.goal
        heap = [(self._heuristic(start, goal), start, 0, [start])]
        visited = set()
        while heap:
            _, current, g_score, path = heapq.heappop(heap)
            if current in visited:
                continue
            visited.add(current)
            if current == goal:
                return path
            r, c = current
            for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
                nr, nc = r + dr, c + dc
                neighbor = (nr, nc)
                if (0 <= nr < state.height and 0 <= nc < state.width
                    and neighbor not in state.obstacles
                    and neighbor not in visited):
                    new_g = g_score + 1
                    heapq.heappush(heap, (
                        new_g + self._heuristic(neighbor, goal),
                        neighbor, new_g, path + [neighbor]
                    ))
        return None

    def _heuristic(self, pos1, pos2) -> int:
        return abs(pos1[0] - pos2[0]) + abs(pos1[1] - pos2[1])

    def _greedy_action(self, state) -> int:
        dr = state.goal[0] - state.agent[0]
        dc = state.goal[1] - state.agent[1]
        if abs(dr) >= abs(dc):
            return self.DOWN if dr > 0 else self.UP
        return self.RIGHT if dc > 0 else self.LEFT

Agent log (19 turns, $0.36)

The agent's workflow:

Explored the sandbox directory and read maze_env.py and base_env.py
Analyzed the state type (_MazeState with agent, goal, obstacles, height, width), action space (4 directional moves), and reward structure (-1 per step)
Identified this as a shortest-path problem and implemented A* with Manhattan distance heuristic
Wrote and ran three test scripts verifying correctness on simple mazes, complex mazes with obstacles, impossible mazes (graceful fallback), and optimality of path lengths
Cleaned up test files and produced the final approach.py

18:32:05 Agent: I'll analyze the environment source files and create an optimal approach.py.
18:32:06 Tool call: Glob({'pattern': '**/*.py'})
18:32:08 Agent: Let me explore the main environment files to understand the structure:
18:32:09 Tool call: Read({'.../maze_env.py'})
18:32:09 Tool call: Read({'.../base_env.py'})
18:32:11 Agent: [Analyzed state/action space, identified A* as optimal strategy]
18:32:40 Tool call: Write({'.../approach.py'})
18:32:43 Agent: Now let me create a simple test script to verify the approach works:
18:33:00 Tool call: Write({'.../test_approach.py'})
18:33:03 Tool call: Bash({'command': 'python test_approach.py'})
18:33:07 Agent: Excellent! Testing on a more complex maze...
18:33:22 Tool call: Write({'.../test_complex_maze.py'})
18:33:26 Tool call: Bash({'command': 'python test_complex_maze.py'})
18:33:31 Agent: A* correctly found optimal 12-step path around all obstacles.
18:33:39 Tool call: Write({'.../test_real_state.py'})
18:33:54 Tool call: Bash({'command': 'python test_real_state.py'})
18:33:58 Agent: Works with actual _MazeState class.
18:34:09 Tool call: Write({'.../test_optimality.py'})
18:34:11 Tool call: Bash({'command': 'python test_optimality.py'})
18:34:13 Agent: All test cases produce optimal solutions.
18:34:14 Tool call: Bash({'command': 'rm test_*.py'})
18:34:25 Session done: turns=19, cost=$0.3628055, error=False

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
docker		docker
experiments		experiments
integration_tests		integration_tests
scripts		scripts
src/robocode		src/robocode
tests		tests
third-party		third-party
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
run_autoformat.sh		run_autoformat.sh
run_ci_checks.sh		run_ci_checks.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoboCode

Installation

System prerequisites

Agent backend setup

Claude Code CLI (default)

OpenCode (multi-provider)

Environments

Maze (discrete)

2D Kinematic (continuous, kinder geom2d)

3D Kinematic (continuous, kinder geom3d)

LIBERO-PRO (manipulation benchmark, optional extra)

Sandbox

Security model

What the agent sees

Start docker

Mac OS

Linux

Building the image

Using the OS-level sandbox (legacy)

Experiments

Agentic approach

Example: small_maze

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoboCode

Installation

System prerequisites

Agent backend setup

Claude Code CLI (default)

OpenCode (multi-provider)

Environments

Maze (discrete)

2D Kinematic (continuous, kinder geom2d)

3D Kinematic (continuous, kinder geom3d)

LIBERO-PRO (manipulation benchmark, optional extra)

Sandbox

Security model

What the agent sees

Start docker

Mac OS

Linux

Building the image

Using the OS-level sandbox (legacy)

Experiments

Agentic approach

Example: small_maze

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages