Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Mini-swe-agent sidecar tool image.
#
# Contains a self-contained Python venv at /opt/mini-swe-agent with
# mini-swe-agent + litellm installed. When mounted into a sandbox at
# /opt/mini-swe-agent, the agent can be invoked via:
#
# /opt/mini-swe-agent/bin/python /opt/mini-swe-agent/bin/run_agent.py ...
#
# Uses python-build-standalone for maximum portability across different
# glibc versions (built against older glibc, forward-compatible).
#
# Build:
# docker build -f Dockerfile.mini-swe-agent-tool -t mini-swe-agent-tool:latest .
#

FROM debian:bullseye-slim AS builder

ARG PBS_RELEASE="20260602"
ARG PBS_PYTHON="3.12.13"
ARG PIP_INDEX_URL=""

# Download and extract python-build-standalone (stripped, 32MB)
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates wget \
&& rm -rf /var/lib/apt/lists/* \
&& wget -q \
"https://github.com/astral-sh/python-build-standalone/releases/download/${PBS_RELEASE}/cpython-${PBS_PYTHON}%2B${PBS_RELEASE}-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz" \
-O /tmp/python.tar.gz \
&& mkdir -p /opt/mini-swe-agent \
&& tar -xzf /tmp/python.tar.gz -C /opt/mini-swe-agent --strip-components=1 \
&& rm /tmp/python.tar.gz

# Install mini-swe-agent + litellm
RUN /opt/mini-swe-agent/bin/pip install --no-cache-dir \
${PIP_INDEX_URL:+-i ${PIP_INDEX_URL}} \
"mini-swe-agent==2.2.8" \
"litellm==1.81.7"

# Copy the in-sandbox runner script
COPY run_agent.py /opt/mini-swe-agent/bin/run_agent.py

# Final scratch image: files are at the image root level so that when
# akernel_sdk.Mount(target="/opt/mini-swe-agent") overlays this image,
# the files appear at /opt/mini-swe-agent/bin/python etc.
FROM scratch
COPY --from=builder /opt/mini-swe-agent /
119 changes: 119 additions & 0 deletions examples/blackbox_recipes/mini_swe_agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Mini-SWE-Agent In-Sandbox Execution

## Overview

`mini-swe-agent` runs inside the SWE-bench sandbox through a sidecar tool image.
The external runner creates the sandbox, mounts the tool image at
`/opt/mini-swe-agent`, starts the agent process, and evaluates the reward in the
same sandbox.

The agent executes commands through `LocalEnvironment` (local bash) inside the
sandbox and calls the LLM through the gateway URL passed in via stdin. The
`mini_swe` tool image uses
[python-build-standalone](https://github.com/astral-sh/python-build-standalone)
to build an isolated Python environment, then copies the result into a minimal
`FROM scratch` final stage, so the sandbox base image does not need to provide
Python for the sidecar tool runtime.

**This recipe is self-contained.** It shares only
[`../sandbox_client.py`](../sandbox_client.py) with the claude-code recipe;
everything else (`dataset.py`, `reward.py`, `run_agent.py`, `build_tool.sh`,
`run_train.sh`, config) lives in this directory and does not depend on
`claude_code/`.

**Supported runners:**

| runner | Description |
|--------|-------------|
| `mini_swe` | mini-swe-agent sidecar runner |

**Supported sandbox types:**

| Type | Description |
|------|-------------|
| openyuanrong | Uses `akernel_sdk.Mount` and `sandbox.commands.run()` |

## Architecture

```text
[Rollouter Host: mini_swe_agent_runner]
|
|-- SandboxClient.create(image, sidecar_image, sidecar_target="/opt/mini-swe-agent")
| `-- akernel: Sandbox(mounts=[Mount(target="/opt/mini-swe-agent", ...)])
|
|-- sandbox.run("<tool entrypoint>")
| `-- [Inside Sandbox]
| /opt/mini-swe-agent/bin/python /opt/mini-swe-agent/bin/run_agent.py
| stdin <- task config JSON (task, gateway_url, agent)
| commands run inside the SWE-bench sandbox
| stdout -> agent execution result JSON
|
|-- parse agent result
|-- SandboxEnvForReward(sandbox) -> evaluate_in_env()
`-- POST session.reward_info_url
```

## Prerequisites

1. **AKernel** — set `AKERNEL_SERVER_ADDRESS` and `AKERNEL_TOKEN`.
2. **Tool image** — build the mini-swe-agent tool image and push it to a remote
registry if the sandbox service cannot access local Docker images.

## 1. Build Tool Image

`mini_swe` is injected into the SWE-bench sandbox as a sidecar tool image. Use
`build_tool.sh` to build it.

| Default tool image | Dockerfile | Sandbox mount path | Image contents |
|--------------------|------------|--------------------|----------------|
| `mini-swe-agent-tool:latest` | `Dockerfile.mini-swe-agent-tool` | `/opt/mini-swe-agent` | Standalone Python 3.12, `mini-swe-agent`, `litellm`, and `run_agent.py` |

```bash
# Use the default PyPI source.
bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh

# Use a custom PyPI mirror.
bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --pip-index https://pypi.tuna.tsinghua.edu.cn/simple/

# Build and push to a remote registry.
bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --registry swr.cn-east-3.myhuaweicloud.com/openyuanrong
```

The `mini_swe` Python runtime is fully isolated from the sandbox container's
Python.

### Build Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `TOOL_IMAGE` | `mini-swe-agent-tool` | Image name |
| `TOOL_TAG` | `latest` | Image tag |
| `PIP_INDEX_URL` | unset, use PyPI | pip index URL (`--pip-index`) |

After pushing, point training at it with `SWE_AGENT_TOOL_IMAGE`.

## 2. Training (Fully Async)

```bash
AKERNEL_SERVER_ADDRESS="6.2.179.37:8888" \
AKERNEL_TOKEN="<token>" \
SWE_AGENT_TOOL_IMAGE=swr.cn-east-3.myhuaweicloud.com/openyuanrong/mini-swe-agent-tool:latest \
MODEL_PATH=~/models/Qwen3.5-9B \
bash examples/blackbox_recipes/mini_swe_agent/run_train.sh
```

The training YAML keeps `mini_swe` as the only runner:

```yaml
agent_runner_fqn: examples.blackbox_recipes.mini_swe_agent.mini_swe_agent_runner.mini_swe_agent_runner
```

## 3. Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `AGENT_MAX_TURNS` | `100` | mini-swe-agent `step_limit` (the agent's turn budget); read by the runner from the `AGENT_MAX_TURNS` env var |
| `SWE_AGENT_EVAL_TIMEOUT` | `600` | Reward evaluation timeout (seconds) |
| `SWE_AGENT_RUN_TIMEOUT` | `7200` | Max wall time for the agent process in the sandbox |
| `SWE_AGENT_TOOL_IMAGE` | `swr.cn-east-3.myhuaweicloud.com/openyuanrong/mini-swe-agent-tool:latest` | Sidecar tool image |
| `CONDA_ENV` | `testbed` | Conda env activated inside the sandbox before running the agent |
Empty file.
56 changes: 56 additions & 0 deletions examples/blackbox_recipes/mini_swe_agent/build_tool.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env bash
# Build the mini-swe-agent sidecar tool image.
#
# The image uses python-build-standalone to build an isolated Python runtime
# with mini-swe-agent + litellm + run_agent.py, copied into a minimal
# `FROM scratch` final stage rooted at /opt/mini-swe-agent. It is mounted into
# the SWE-bench sandbox at /opt/mini-swe-agent, so the sandbox base image does
# not need Python for the sidecar tool runtime.
#
# Usage:
# bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh
# bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --pip-index https://pypi.tuna.tsinghua.edu.cn/simple/
# bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --registry swr.cn-east-3.myhuaweicloud.com/openyuanrong
#
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
IMAGE_NAME="${TOOL_IMAGE:-mini-swe-agent-tool}"
IMAGE_TAG="${TOOL_TAG:-latest}"

# Parse args
REGISTRY=""
PIP_INDEX_URL="${PIP_INDEX_URL:-}"
while [[ $# -gt 0 ]]; do
case "$1" in
--registry) REGISTRY="$2"; shift 2 ;;
--pip-index) PIP_INDEX_URL="$2"; shift 2 ;;
*) echo "Unknown arg: $1"; exit 1 ;;
esac
done

BUILD_ARGS=()
if [[ -n "${PIP_INDEX_URL}" ]]; then
BUILD_ARGS+=(--build-arg PIP_INDEX_URL="${PIP_INDEX_URL}")
fi

echo "==> Building mini_swe tool image: ${IMAGE_NAME}:${IMAGE_TAG}"
docker build \
-f "${SCRIPT_DIR}/Dockerfile.mini-swe-agent-tool" \
-t "${IMAGE_NAME}:${IMAGE_TAG}" \
"${BUILD_ARGS[@]}" \
"${SCRIPT_DIR}/"

if [[ -n "${REGISTRY}" ]]; then
FULL_TAG="${REGISTRY}/${IMAGE_NAME}:${IMAGE_TAG}"
echo "==> Tagging and pushing: ${FULL_TAG}"
docker tag "${IMAGE_NAME}:${IMAGE_TAG}" "${FULL_TAG}"
docker push "${FULL_TAG}"
echo " Pushed."
fi

echo ""
echo "Tool image ready: ${IMAGE_NAME}:${IMAGE_TAG}"
if [[ -n "${REGISTRY}" ]]; then
echo " Remote sandbox: ${FULL_TAG}"
fi
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Megatron + V1 unified trainer config for the blackbox mini-swe recipe.
#
# Entry point: python3 -m verl.trainer.main_ppo
# Default trainer mode is separate_async. On a single 8-GPU node this recipe
# uses 4 GPUs for trainer and 4 GPUs for standalone rollout.

hydra:
searchpath:
- pkg://verl.trainer.config

defaults:
- ppo_megatron_trainer
- _self_

actor_rollout_ref:
hybrid_engine: true
nccl_timeout: 9600

model:
path: ???

rollout:
name: vllm
mode: async
nnodes: 1
n_gpus_per_node: 4
prompt_length: 4096
response_length: 131072
max_model_len: 135168
temperature: 1.0
top_p: 1.0
top_k: -1
n: 8
tensor_model_parallel_size: 4
gpu_memory_utilization: 0.7
calculate_log_probs: true
enable_sleep_mode: true
free_cache_engine: true
enable_chunked_prefill: true
max_num_batched_tokens: 135168
checkpoint_engine:
backend: nccl
update_weights_bucket_megabytes: 2048

multi_turn:
enable: true
max_parallel_calls: 1
format: qwen3_coder

agent:
num_workers: 8
agent_loop_manager_class: uni_agent.framework.entry.AgentFrameworkRolloutAdapter

custom:
agent_framework:
gateway_count: 1
agent_runners:
swe_agent:
runner_fqn: examples.blackbox_recipes.mini_swe_agent.mini_swe_agent_runner.mini_swe_agent_runner
dispatch_mode: ray_task
max_concurrent_sessions: 32
runner_kwargs:
tool_image: swr.cn-east-3.myhuaweicloud.com/openyuanrong/mini-swe-agent-tool:latest
run_timeout: 3600
conda_env: testbed

actor:
use_dynamic_bsz: true
use_rollout_log_probs: true
ppo_mini_batch_size: 16
ppo_micro_batch_size_per_gpu: 1
use_kl_loss: false
kl_loss_coef: 0.0
clip_ratio_low: 0.2
clip_ratio_high: 0.28
clip_ratio_c: 10.0
loss_agg_mode: token-mean
entropy_coeff: 0
optim:
lr: 1e-6
weight_decay: 0.1
lr_decay_style: constant
megatron:
param_offload: true
grad_offload: true
optimizer_offload: true
tensor_model_parallel_size: 4
pipeline_model_parallel_size: 1
context_parallel_size: 1
use_mbridge: true
use_remove_padding: false

ref:
log_prob_micro_batch_size_per_gpu: 1
megatron:
param_offload: false
tensor_model_parallel_size: 4
pipeline_model_parallel_size: 1
context_parallel_size: 1

data:
train_files: ???
val_files: ???
prompt_key: prompt
truncation: left
max_prompt_length: 4096
max_response_length: 131072
train_batch_size: 1
val_batch_size: 1
gen_batch_size: 1
return_raw_chat: true
trust_remote_code: true
dataloader_num_workers: 0
custom_cls:
path: pkg://examples.blackbox_recipes.mini_swe_agent.dataset
name: SWEBenchDataset

algorithm:
gamma: 1.0
lam: 1.0
adv_estimator: grpo
use_kl_in_reward: false
kl_ctrl:
type: fixed
kl_coef: 0.0
rollout_correction:
bypass_mode: true

reward:
custom_reward_function:
path: pkg://examples.blackbox_recipes.mini_swe_agent.reward
name: compute_score

trainer:
nnodes: 1
n_gpus_per_node: 4
total_epochs: 10
total_training_steps: null
project_name: swe_agent_blackbox
experiment_name: swe_agent
logger:
- console
device: cuda
val_before_train: true
val_only: false
save_freq: 10
test_freq: 10
default_local_dir: checkpoints/swe_agent_blackbox
resume_mode: auto
use_v1: true
v1:
trainer_mode: separate_async
colocate_async:
num_warmup_batches: 1
separate_async:
num_warmup_batches: 4
parameter_sync_step: 4

transfer_queue:
enable: true

ray_kwargs:
ray_init:
runtime_env:
env_vars:
TRANSFER_QUEUE_ENABLE: ""
NCCL_P2P_DISABLE: "1"
NCCL_SHM_DISABLE: "1"
Loading
Loading