Skip to content

tomsilver/robocode

Repository files navigation

RoboCode

workflow

Agents for robot physical reasoning.

Work in progress.

Installation

git clone https://github.com/tomsilver/robocode.git
cd robocode
bash install.sh

System prerequisites

The following tools are not installed by install.sh / uv sync and must be set up separately:

Tool Required for Install
Ollama Local model serving (Claude + Ollama, OpenCode + Ollama) curl -fsSL https://ollama.com/install.sh | sh
Claude Code CLI claude backend (default) curl -fsSL https://claude.ai/install.sh | bash
OpenCode CLI opencode backend (multi-provider) curl -fsSL https://opencode.ai/install | bash
vLLM Serving models via OpenAI-compatible API pip install vllm (in a separate env)
Docker Docker sandbox (recommended for isolation) See Docker docs

For local model serving with Ollama, pull a model after installing:

ollama pull gemma4:31b

Agent backend setup

The agentic approach supports two backends: Claude Code CLI (default) and OpenCode (for GPT, Gemini, open-source models via vLLM/Ollama, etc.).

Claude Code CLI (default)

The Claude Code CLI (claude) is the default backend. Authenticate via one of:

  • Subscription (free usage): claude auth login
  • API key: set ANTHROPIC_API_KEY in your environment

Optionally set ROBOCODE_CLAUDE_CMD to point to a specific claude binary (defaults to claude on PATH).

The model parameter in agentic.yaml accepts CLI model aliases or full model IDs. Override per-run with e.g. model=opus on the command line.

Alias Full model ID
sonnet claude-sonnet-4-6 (latest Sonnet, default)
opus claude-opus-4-6 (latest Opus)
haiku claude-haiku-4-5-20251001 (latest Haiku)

See Anthropic models overview for the full list.

OpenCode (multi-provider)

OpenCode supports 75+ providers including OpenAI, Google, Anthropic, and local models served via Ollama or vLLM.

Install: curl -fsSL https://opencode.ai/install | bash (also pre-installed in the Docker image).

Authenticate with your provider:

# API key (set the appropriate env var for your provider)
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...

# Or use OpenCode's interactive auth
opencode providers login

Optionally set ROBOCODE_OPENCODE_CMD to point to a specific opencode binary.

Models use the provider/model format:

Model Provider
openai/gpt-4o OpenAI
google/gemini-2.5-pro Google
anthropic/claude-sonnet-4-5 Anthropic
ollama/qwen3.5:latest Ollama (local)

For local models (Ollama, vLLM), create an opencode.json config with your provider:

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": { "qwen3.5:latest": { "name": "Qwen 3.5" } }
    }
  }
}

Environments

All environments are available as Hydra configs via environment=<config_name>.

Maze (discrete)

Config Description
small_maze Small grid maze
large_maze Large grid maze

2D Kinematic (continuous, kinder geom2d)

Config Kinder ID Difficulty
motion2d_easy kinder/Motion2D-p0-v0 Easy (0 passages)
motion2d_medium kinder/Motion2D-p1-v0 Medium (1 passage)
motion2d_hard kinder/Motion2D-p3-v0 Hard (3 passages)
obstruction2d_easy kinder/Obstruction2D-o0-v0 Easy (0 obstructions)
obstruction2d_medium kinder/Obstruction2D-o2-v0 Medium (2 obstructions)
obstruction2d_hard kinder/Obstruction2D-o4-v0 Hard (4 obstructions)
clutteredretrieval2d_easy kinder/ClutteredRetrieval2D-o1-v0 Easy (1 obstruction)
clutteredretrieval2d_medium kinder/ClutteredRetrieval2D-o10-v0 Medium (10 obstructions)
clutteredretrieval2d_hard kinder/ClutteredRetrieval2D-o25-v0 Hard (25 obstructions)
clutteredstorage2d_easy kinder/ClutteredStorage2D-b1-v0 Easy (1 block)
clutteredstorage2d_medium kinder/ClutteredStorage2D-b3-v0 Medium (3 blocks)
clutteredstorage2d_hard kinder/ClutteredStorage2D-b7-v0 Hard (7 blocks)
stickbutton2d_easy kinder/StickButton2D-b1-v0 Easy (1 button)
stickbutton2d_medium kinder/StickButton2D-b3-v0 Medium (3 buttons)
stickbutton2d_hard kinder/StickButton2D-b5-v0 Hard (5 buttons)
pushpullhook2d kinder/PushPullHook2D-v0 Single variant

3D Kinematic (continuous, kinder geom3d)

Config Kinder ID Difficulty
motion3d kinder/Motion3D-v0 Single variant
obstruction3d_easy kinder/Obstruction3D-o0-v0 Easy (0 obstructions)
obstruction3d_medium kinder/Obstruction3D-o2-v0 Medium (2 obstructions)
obstruction3d_hard kinder/Obstruction3D-o4-v0 Hard (4 obstructions)
shelf3d_easy kinder/KinematicShelf3D-o1-v0 Easy (1 cube)
shelf3d_medium kinder/KinematicShelf3D-o3-v0 Medium (3 cubes)
shelf3d_hard kinder/KinematicShelf3D-o5-v0 Hard (5 cubes)
transport3d_easy kinder/Transport3D-o1-v0 Easy (1 cube)
transport3d_hard kinder/Transport3D-o2-v0 Hard (2 cubes)
packing3d_easy kinder/Packing3D-p1-v0 Easy (1 part)
packing3d_medium kinder/Packing3D-p2-v0 Medium (2 parts)
packing3d_hard kinder/Packing3D-p3-v0 Hard (3 parts)

LIBERO-PRO (manipulation benchmark, optional extra)

LIBERO-PRO is a Franka tabletop manipulation benchmark (~80 task suites covering goal / spatial / object / 10-task mixes plus OOD and perturbation variants) built on MuJoCo via robosuite. It is vendored as a submodule under third-party/LIBERO-PRO/ and gated behind the optional libero extra — it is not installed by default because it pins old upstreams (robosuite==1.4.0, gym==0.25.2, robomimic==0.2.0, bddl==1.0.1) and drags in a CUDA-enabled torch.

Install (into the same venv as the rest of robocode):

sudo apt-get install -y libegl1 libgl1    # EGL/GL runtime for MuJoCo
uv sync --extra libero --all-extras --dev  # ~60 extra Python packages, several GB

First use of the libero package runs an interactive input() prompt asking where to store datasets; the test harness writes ~/.libero/config.yaml automatically. If you hit the prompt manually, answer N — the default paths are fine for env rollouts (pre-recorded demos are not required).

List available benchmark suites:

from libero import benchmark
print(list(benchmark.get_benchmark_dict().keys()))  # ~80 suites

Minimal rollout on libero_goal task 0:

from libero import benchmark
from libero.envs import OffScreenRenderEnv

task_suite = benchmark.get_benchmark_dict()["libero_goal"]()
bddl = task_suite.get_task_bddl_file_path(0)
env = OffScreenRenderEnv(bddl_file_name=bddl, camera_heights=128, camera_widths=128)
env.seed(0)
obs = env.reset()   # dict with agentview_image, robot state, per-object poses, ...
obs, reward, done, info = env.step([0.0] * 7)
env.close()

Smoke tests live at tests/environments/test_libero.py (benchmark dict + rollout); they skip cleanly if the extra isn't installed.

Note on OpenGL: LIBERO's MuJoCo needs to coexist in-process with kinder's pybullet. src/robocode/environments/kinder_geom2d_env.py and kinder_geom3d_env.py pin MUJOCO_GL=egl / PYOPENGL_PLATFORM=egl before kinder loads so PyOpenGL latches to the EGL platform — without this, later robosuite imports in the same process fail with 'NoneType' object has no attribute 'glGetError'. If you see that error, confirm libegl1 is installed.

Sandbox

The agent runs inside a Docker container (robocode-sandbox) that provides full filesystem isolation, a restricted network, and a pre-built Python environment.

Security model

Layer Mechanism
Filesystem Docker bind-mount: agent can only write to /sandbox (the run's output dir)
Network init-firewall.sh whitelists API endpoints for the configured provider (Anthropic, OpenAI, Google, etc.), GitHub IPs, and telemetry; blocks everything else via iptables. Extra domains are passed via ROBOCODE_FIREWALL_EXTRA_DOMAINS.
Write hook Claude backend: PreToolUse hook in .claude/settings.json double-checks Write/Edit paths stay inside /sandbox. OpenCode backend: "permission": "allow" in opencode.json (Docker provides the isolation).

What the agent sees

Path Contents
/sandbox/ Working directory — agent writes approach.py, test scripts, etc. here
/sandbox/primitives/ Source files from src/robocode/primitives/ (read reference)
/robocode/.venv/bin/python Python 3.11 with all robocode dependencies pre-installed
/robocode/prpl-mono/ Third-party packages, bind-mounted read-only from the host submodule

Start docker

Mac OS

Simply open the Docker Desktop application. Look for the status indicator in the bottom-left corner of the GUI; it should say "Docker Engine Running".

Linux

sudo systemctl start docker
sudo systemctl enable docker

Building the image

Build once from the repo root (rebuild when pyproject.toml / uv.lock change; not needed for prpl-mono code changes):

bash docker/build.sh

Using the OS-level sandbox (legacy)

The original macOS Seatbelt / Linux bubblewrap sandbox is still available (use_docker: false in agentic.yaml) but has a known limitation: it restricts filesystem writes but allows reads of the entire host filesystem.

Red team the sandbox:

python integration_tests/red_team_sandbox.py           # OS-level
python integration_tests/red_team_sandbox.py --docker  # Docker

Experiments

Run an experiment:

python experiments/run_experiment.py approach=random environment=small_maze seed=0

Run a sweep over multiple seeds and environments:

python experiments/run_experiment.py -m seed=0,1,2 environment=small_maze,large_maze approach=random

Analyze results from one or more runs:

python experiments/analyze_results.py multirun/

Agentic approach

The agentic approach launches a coding agent during train(). The agent reads the environment source code, figures out the state/action space and dynamics, and writes a GeneratedApproach class that is used at evaluation time. The agent can also write and run test scripts against the real environment to verify its solution before committing.

By default the agent uses the Claude Code CLI backend and runs in the Docker sandbox (requires bash docker/build.sh once):

python experiments/run_experiment.py approach=agentic environment=motion2d_easy

To use a different backend/model, override the approach/backend config:

# GPT-4o via OpenCode
python experiments/run_experiment.py approach=agentic approach/backend=opencode_gpt4o

# Local Ollama model
python experiments/run_experiment.py approach=agentic approach/backend=opencode_ollama

# Or override individual fields
python experiments/run_experiment.py approach=agentic approach.backend.backend=opencode approach.backend.model=google/gemini-2.5-pro

Available backend presets: claude_sonnet (default), claude_opus, opencode_gpt4o, opencode_gemini, opencode_ollama.

To use the legacy OS-level sandbox instead:

python experiments/run_experiment.py approach=agentic environment=small_maze approach.use_docker=false

To skip re-generation and load a previously generated approach:

python experiments/run_experiment.py approach=agentic environment=small_maze \
    approach.load_dir=outputs/2026-02-16/16-00-41

Parallel sweeps each get their own container (named robocode-sandbox-<uuid>), so multiple runs never interfere:

python experiments/run_experiment.py -m seed=0,1,2 environment=small_maze,large_maze approach=agentic

Use the joblib launcher to run jobs in parallel locally:

python experiments/run_experiment.py -m \
    approach=agentic \
    approach.use_docker=true \
    seed=42,24,424,444,222 \
    'primitives=[]' \
    environment=motion2d_easy,obstruction2d_easy,clutteredretrieval2d_easy,clutteredstorage2d_easy,stickbutton2d_easy,pushpullhook2d \
    'hydra.sweep.dir=multirun/2026-02-23/no_primitives_5d_s42_24_424_444_222' \
    'hydra.sweep.subdir=s${seed}/${hydra:runtime.choices.environment}' \
    hydra/launcher=joblib hydra.launcher.n_jobs=4

The generated approach.py and full agent log are saved under sandbox/ in the run's output directory (e.g. outputs/2026-02-16/16-00-41/sandbox/).

Example: small_maze

On small_maze, the agent independently discovered A* pathfinding and achieved a 100% solve rate with optimal path lengths (mean 2.3 steps across 10 episodes):

{
  "mean_eval_reward": -2.3,
  "mean_eval_steps": 2.3,
  "solve_rate": 1.0,
  "num_eval_tasks": 10
}
Generated approach.py (A* pathfinding)
"""Optimal approach for MazeEnv using A* pathfinding algorithm."""

import heapq
from typing import Optional


class GeneratedApproach:
    """Optimal maze solver using A* pathfinding."""

    def __init__(self, action_space, observation_space):
        self.action_space = action_space
        self.observation_space = observation_space
        self.planned_path: Optional[list[tuple[int, int]]] = None
        self.path_index = 0

        self.UP = 0
        self.DOWN = 1
        self.LEFT = 2
        self.RIGHT = 3

        self.action_to_delta = {
            self.UP: (-1, 0),
            self.DOWN: (1, 0),
            self.LEFT: (0, -1),
            self.RIGHT: (0, 1)
        }

    def reset(self, state, info):
        self.planned_path = self._astar_search(state)
        self.path_index = 0

    def get_action(self, state):
        if self.planned_path and self.path_index < len(self.planned_path) - 1:
            next_pos = self.planned_path[self.path_index + 1]
            dr = next_pos[0] - state.agent[0]
            dc = next_pos[1] - state.agent[1]
            for action, (delta_r, delta_c) in self.action_to_delta.items():
                if (dr, dc) == (delta_r, delta_c):
                    self.path_index += 1
                    return action
        return self._greedy_action(state)

    def _astar_search(self, state) -> Optional[list[tuple[int, int]]]:
        start, goal = state.agent, state.goal
        heap = [(self._heuristic(start, goal), start, 0, [start])]
        visited = set()
        while heap:
            _, current, g_score, path = heapq.heappop(heap)
            if current in visited:
                continue
            visited.add(current)
            if current == goal:
                return path
            r, c = current
            for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
                nr, nc = r + dr, c + dc
                neighbor = (nr, nc)
                if (0 <= nr < state.height and 0 <= nc < state.width
                    and neighbor not in state.obstacles
                    and neighbor not in visited):
                    new_g = g_score + 1
                    heapq.heappush(heap, (
                        new_g + self._heuristic(neighbor, goal),
                        neighbor, new_g, path + [neighbor]
                    ))
        return None

    def _heuristic(self, pos1, pos2) -> int:
        return abs(pos1[0] - pos2[0]) + abs(pos1[1] - pos2[1])

    def _greedy_action(self, state) -> int:
        dr = state.goal[0] - state.agent[0]
        dc = state.goal[1] - state.agent[1]
        if abs(dr) >= abs(dc):
            return self.DOWN if dr > 0 else self.UP
        return self.RIGHT if dc > 0 else self.LEFT
Agent log (19 turns, $0.36)

The agent's workflow:

  1. Explored the sandbox directory and read maze_env.py and base_env.py
  2. Analyzed the state type (_MazeState with agent, goal, obstacles, height, width), action space (4 directional moves), and reward structure (-1 per step)
  3. Identified this as a shortest-path problem and implemented A* with Manhattan distance heuristic
  4. Wrote and ran three test scripts verifying correctness on simple mazes, complex mazes with obstacles, impossible mazes (graceful fallback), and optimality of path lengths
  5. Cleaned up test files and produced the final approach.py
18:32:05 Agent: I'll analyze the environment source files and create an optimal approach.py.
18:32:06 Tool call: Glob({'pattern': '**/*.py'})
18:32:08 Agent: Let me explore the main environment files to understand the structure:
18:32:09 Tool call: Read({'.../maze_env.py'})
18:32:09 Tool call: Read({'.../base_env.py'})
18:32:11 Agent: [Analyzed state/action space, identified A* as optimal strategy]
18:32:40 Tool call: Write({'.../approach.py'})
18:32:43 Agent: Now let me create a simple test script to verify the approach works:
18:33:00 Tool call: Write({'.../test_approach.py'})
18:33:03 Tool call: Bash({'command': 'python test_approach.py'})
18:33:07 Agent: Excellent! Testing on a more complex maze...
18:33:22 Tool call: Write({'.../test_complex_maze.py'})
18:33:26 Tool call: Bash({'command': 'python test_complex_maze.py'})
18:33:31 Agent: A* correctly found optimal 12-step path around all obstacles.
18:33:39 Tool call: Write({'.../test_real_state.py'})
18:33:54 Tool call: Bash({'command': 'python test_real_state.py'})
18:33:58 Agent: Works with actual _MazeState class.
18:34:09 Tool call: Write({'.../test_optimality.py'})
18:34:11 Tool call: Bash({'command': 'python test_optimality.py'})
18:34:13 Agent: All test cases produce optimal solutions.
18:34:14 Tool call: Bash({'command': 'rm test_*.py'})
18:34:25 Session done: turns=19, cost=$0.3628055, error=False

About

Agents for robot physical reasoning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors