verl-project · zhaizhiqiangA · Jun 26, 2026 · Jun 26, 2026 · Jun 29, 2026
diff --git a/examples/blackbox_recipes/mini_swe_agent/Dockerfile.mini-swe-agent-tool b/examples/blackbox_recipes/mini_swe_agent/Dockerfile.mini-swe-agent-tool
@@ -0,0 +1,45 @@
+# Mini-swe-agent sidecar tool image.
+#
+# Contains a self-contained Python venv at /opt/mini-swe-agent with
+# mini-swe-agent + litellm installed.  When mounted into a sandbox at
+# /opt/mini-swe-agent, the agent can be invoked via:
+#
+#   /opt/mini-swe-agent/bin/python /opt/mini-swe-agent/bin/run_agent.py ...
+#
+# Uses python-build-standalone for maximum portability across different
+# glibc versions (built against older glibc, forward-compatible).
+#
+# Build:
+#   docker build -f Dockerfile.mini-swe-agent-tool -t mini-swe-agent-tool:latest .
+#
+
+FROM debian:bullseye-slim AS builder
+
+ARG PBS_RELEASE="20260602"
+ARG PBS_PYTHON="3.12.13"
+ARG PIP_INDEX_URL=""
+
+# Download and extract python-build-standalone (stripped, 32MB)
+RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates wget \
+    && rm -rf /var/lib/apt/lists/* \
+    && wget -q \
+        "https://github.com/astral-sh/python-build-standalone/releases/download/${PBS_RELEASE}/cpython-${PBS_PYTHON}%2B${PBS_RELEASE}-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz" \
+        -O /tmp/python.tar.gz \
+    && mkdir -p /opt/mini-swe-agent \
+    && tar -xzf /tmp/python.tar.gz -C /opt/mini-swe-agent --strip-components=1 \
+    && rm /tmp/python.tar.gz
+
+# Install mini-swe-agent + litellm
+RUN /opt/mini-swe-agent/bin/pip install --no-cache-dir \
+    ${PIP_INDEX_URL:+-i ${PIP_INDEX_URL}} \
+    "mini-swe-agent==2.2.8" \
+    "litellm==1.81.7"
+
+# Copy the in-sandbox runner script
+COPY run_agent.py /opt/mini-swe-agent/bin/run_agent.py
+
+# Final scratch image: files are at the image root level so that when
+# akernel_sdk.Mount(target="/opt/mini-swe-agent") overlays this image,
+# the files appear at /opt/mini-swe-agent/bin/python etc.
+FROM scratch
+COPY --from=builder /opt/mini-swe-agent /
diff --git a/examples/blackbox_recipes/mini_swe_agent/README.md b/examples/blackbox_recipes/mini_swe_agent/README.md
@@ -0,0 +1,119 @@
+# Mini-SWE-Agent In-Sandbox Execution
+
+## Overview
+
+`mini-swe-agent` runs inside the SWE-bench sandbox through a sidecar tool image.
+The external runner creates the sandbox, mounts the tool image at
+`/opt/mini-swe-agent`, starts the agent process, and evaluates the reward in the
+same sandbox.
+
+The agent executes commands through `LocalEnvironment` (local bash) inside the
+sandbox and calls the LLM through the gateway URL passed in via stdin. The
+`mini_swe` tool image uses
+[python-build-standalone](https://github.com/astral-sh/python-build-standalone)
+to build an isolated Python environment, then copies the result into a minimal
+`FROM scratch` final stage, so the sandbox base image does not need to provide
+Python for the sidecar tool runtime.
+
+**This recipe is self-contained.** It shares only
+[`../sandbox_client.py`](../sandbox_client.py) with the claude-code recipe;
+everything else (`dataset.py`, `reward.py`, `run_agent.py`, `build_tool.sh`,
+`run_train.sh`, config) lives in this directory and does not depend on
+`claude_code/`.
+
+**Supported runners:**
+
+| runner | Description |
+|--------|-------------|
+| `mini_swe` | mini-swe-agent sidecar runner |
+
+**Supported sandbox types:**
+
+| Type | Description |
+|------|-------------|
+| openyuanrong | Uses `akernel_sdk.Mount` and `sandbox.commands.run()` |
+
+## Architecture
+
+```text
+[Rollouter Host: mini_swe_agent_runner]
+  |
+  |-- SandboxClient.create(image, sidecar_image, sidecar_target="/opt/mini-swe-agent")
+  |     `-- akernel: Sandbox(mounts=[Mount(target="/opt/mini-swe-agent", ...)])
+  |
+  |-- sandbox.run("<tool entrypoint>")
+  |     `-- [Inside Sandbox]
+  |           /opt/mini-swe-agent/bin/python /opt/mini-swe-agent/bin/run_agent.py
+  |           stdin <- task config JSON (task, gateway_url, agent)
+  |           commands run inside the SWE-bench sandbox
+  |           stdout -> agent execution result JSON
+  |
+  |-- parse agent result
+  |-- SandboxEnvForReward(sandbox) -> evaluate_in_env()
+  `-- POST session.reward_info_url
+```
+
+## Prerequisites
+
+1. **AKernel** — set `AKERNEL_SERVER_ADDRESS` and `AKERNEL_TOKEN`.
+2. **Tool image** — build the mini-swe-agent tool image and push it to a remote
+   registry if the sandbox service cannot access local Docker images.
+
+## 1. Build Tool Image
+
+`mini_swe` is injected into the SWE-bench sandbox as a sidecar tool image. Use
+`build_tool.sh` to build it.
+
+| Default tool image | Dockerfile | Sandbox mount path | Image contents |
+|--------------------|------------|--------------------|----------------|
+| `mini-swe-agent-tool:latest` | `Dockerfile.mini-swe-agent-tool` | `/opt/mini-swe-agent` | Standalone Python 3.12, `mini-swe-agent`, `litellm`, and `run_agent.py` |
+
+```bash
+# Use the default PyPI source.
+bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh
+
+# Use a custom PyPI mirror.
+bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --pip-index https://pypi.tuna.tsinghua.edu.cn/simple/
+
+# Build and push to a remote registry.
+bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --registry swr.cn-east-3.myhuaweicloud.com/openyuanrong
+```
+
+The `mini_swe` Python runtime is fully isolated from the sandbox container's
+Python.
+
+### Build Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `TOOL_IMAGE` | `mini-swe-agent-tool` | Image name |
+| `TOOL_TAG` | `latest` | Image tag |
+| `PIP_INDEX_URL` | unset, use PyPI | pip index URL (`--pip-index`) |
+
+After pushing, point training at it with `SWE_AGENT_TOOL_IMAGE`.
+
+## 2. Training (Fully Async)
+
+```bash
+AKERNEL_SERVER_ADDRESS="6.2.179.37:8888" \
+AKERNEL_TOKEN="<token>" \
+SWE_AGENT_TOOL_IMAGE=swr.cn-east-3.myhuaweicloud.com/openyuanrong/mini-swe-agent-tool:latest \
+MODEL_PATH=~/models/Qwen3.5-9B \
+bash examples/blackbox_recipes/mini_swe_agent/run_train.sh
+```
+
+The training YAML keeps `mini_swe` as the only runner:
+
+```yaml
+agent_runner_fqn: examples.blackbox_recipes.mini_swe_agent.mini_swe_agent_runner.mini_swe_agent_runner
+```
+
+## 3. Configuration
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `AGENT_MAX_TURNS` | `100` | mini-swe-agent `step_limit` (the agent's turn budget); read by the runner from the `AGENT_MAX_TURNS` env var |
+| `SWE_AGENT_EVAL_TIMEOUT` | `600` | Reward evaluation timeout (seconds) |
+| `SWE_AGENT_RUN_TIMEOUT` | `7200` | Max wall time for the agent process in the sandbox |
+| `SWE_AGENT_TOOL_IMAGE` | `swr.cn-east-3.myhuaweicloud.com/openyuanrong/mini-swe-agent-tool:latest` | Sidecar tool image |
+| `CONDA_ENV` | `testbed` | Conda env activated inside the sandbox before running the agent |
diff --git a/examples/blackbox_recipes/mini_swe_agent/__init__.py b/examples/blackbox_recipes/mini_swe_agent/__init__.py
diff --git a/examples/blackbox_recipes/mini_swe_agent/build_tool.sh b/examples/blackbox_recipes/mini_swe_agent/build_tool.sh
@@ -0,0 +1,56 @@
+#!/usr/bin/env bash
+# Build the mini-swe-agent sidecar tool image.
+#
+# The image uses python-build-standalone to build an isolated Python runtime
+# with mini-swe-agent + litellm + run_agent.py, copied into a minimal
+# `FROM scratch` final stage rooted at /opt/mini-swe-agent. It is mounted into
+# the SWE-bench sandbox at /opt/mini-swe-agent, so the sandbox base image does
+# not need Python for the sidecar tool runtime.
+#
+# Usage:
+#   bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh
+#   bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --pip-index https://pypi.tuna.tsinghua.edu.cn/simple/
+#   bash examples/blackbox_recipes/mini_swe_agent/build_tool.sh --registry swr.cn-east-3.myhuaweicloud.com/openyuanrong
+#
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+IMAGE_NAME="${TOOL_IMAGE:-mini-swe-agent-tool}"
+IMAGE_TAG="${TOOL_TAG:-latest}"
+
+# Parse args
+REGISTRY=""
+PIP_INDEX_URL="${PIP_INDEX_URL:-}"
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --registry) REGISTRY="$2"; shift 2 ;;
+        --pip-index) PIP_INDEX_URL="$2"; shift 2 ;;
+        *) echo "Unknown arg: $1"; exit 1 ;;
+    esac
+done
+
+BUILD_ARGS=()
+if [[ -n "${PIP_INDEX_URL}" ]]; then
+    BUILD_ARGS+=(--build-arg PIP_INDEX_URL="${PIP_INDEX_URL}")
+fi
+
+echo "==> Building mini_swe tool image: ${IMAGE_NAME}:${IMAGE_TAG}"
+docker build \
+    -f "${SCRIPT_DIR}/Dockerfile.mini-swe-agent-tool" \
+    -t "${IMAGE_NAME}:${IMAGE_TAG}" \
+    "${BUILD_ARGS[@]}" \
+    "${SCRIPT_DIR}/"
+
+if [[ -n "${REGISTRY}" ]]; then
+    FULL_TAG="${REGISTRY}/${IMAGE_NAME}:${IMAGE_TAG}"
+    echo "==> Tagging and pushing: ${FULL_TAG}"
+    docker tag "${IMAGE_NAME}:${IMAGE_TAG}" "${FULL_TAG}"
+    docker push "${FULL_TAG}"
+    echo "    Pushed."
+fi
+
+echo ""
+echo "Tool image ready: ${IMAGE_NAME}:${IMAGE_TAG}"
+if [[ -n "${REGISTRY}" ]]; then
+    echo "  Remote sandbox: ${FULL_TAG}"
+fi
diff --git a/examples/blackbox_recipes/mini_swe_agent/config/swe_agent_blackbox_megatron_v1.yaml b/examples/blackbox_recipes/mini_swe_agent/config/swe_agent_blackbox_megatron_v1.yaml
@@ -0,0 +1,168 @@
+# Megatron + V1 unified trainer config for the blackbox mini-swe recipe.
+#
+# Entry point: python3 -m verl.trainer.main_ppo
+# Default trainer mode is separate_async. On a single 8-GPU node this recipe
+# uses 4 GPUs for trainer and 4 GPUs for standalone rollout.
+
+hydra:
+  searchpath:
+    - pkg://verl.trainer.config
+
+defaults:
+  - ppo_megatron_trainer
+  - _self_
+
+actor_rollout_ref:
+  hybrid_engine: true
+  nccl_timeout: 9600
+
+  model:
+    path: ???
+
+  rollout:
+    name: vllm
+    mode: async
+    nnodes: 1
+    n_gpus_per_node: 4
+    prompt_length: 4096
+    response_length: 131072
+    max_model_len: 135168
+    temperature: 1.0
+    top_p: 1.0
+    top_k: -1
+    n: 8
+    tensor_model_parallel_size: 4
+    gpu_memory_utilization: 0.7
+    calculate_log_probs: true
+    enable_sleep_mode: true
+    free_cache_engine: true
+    enable_chunked_prefill: true
+    max_num_batched_tokens: 135168
+    checkpoint_engine:
+      backend: nccl
+      update_weights_bucket_megabytes: 2048
+
+    multi_turn:
+      enable: true
+      max_parallel_calls: 1
+      format: qwen3_coder
+
+    agent:
+      num_workers: 8
+      agent_loop_manager_class: uni_agent.framework.entry.AgentFrameworkRolloutAdapter
+
+    custom:
+      agent_framework:
+        gateway_count: 1
+        agent_runners:
+          swe_agent:
+            runner_fqn: examples.blackbox_recipes.mini_swe_agent.mini_swe_agent_runner.mini_swe_agent_runner
+            dispatch_mode: ray_task
+            max_concurrent_sessions: 32
+            runner_kwargs:
+              tool_image: swr.cn-east-3.myhuaweicloud.com/openyuanrong/mini-swe-agent-tool:latest
+              run_timeout: 3600
+              conda_env: testbed
+
+  actor:
+    use_dynamic_bsz: true
+    use_rollout_log_probs: true
+    ppo_mini_batch_size: 16
+    ppo_micro_batch_size_per_gpu: 1
+    use_kl_loss: false
+    kl_loss_coef: 0.0
+    clip_ratio_low: 0.2
+    clip_ratio_high: 0.28
+    clip_ratio_c: 10.0
+    loss_agg_mode: token-mean
+    entropy_coeff: 0
+    optim:
+      lr: 1e-6
+      weight_decay: 0.1
+      lr_decay_style: constant
+    megatron:
+      param_offload: true
+      grad_offload: true
+      optimizer_offload: true
+      tensor_model_parallel_size: 4
+      pipeline_model_parallel_size: 1
+      context_parallel_size: 1
+      use_mbridge: true
+      use_remove_padding: false
+
+  ref:
+    log_prob_micro_batch_size_per_gpu: 1
+    megatron:
+      param_offload: false
+      tensor_model_parallel_size: 4
+      pipeline_model_parallel_size: 1
+      context_parallel_size: 1
+
+data:
+  train_files: ???
+  val_files: ???
+  prompt_key: prompt
+  truncation: left
+  max_prompt_length: 4096
+  max_response_length: 131072
+  train_batch_size: 1
+  val_batch_size: 1
+  gen_batch_size: 1
+  return_raw_chat: true
+  trust_remote_code: true
+  dataloader_num_workers: 0
+  custom_cls:
+    path: pkg://examples.blackbox_recipes.mini_swe_agent.dataset
+    name: SWEBenchDataset
+
+algorithm:
+  gamma: 1.0
+  lam: 1.0
+  adv_estimator: grpo
+  use_kl_in_reward: false
+  kl_ctrl:
+    type: fixed
+    kl_coef: 0.0
+  rollout_correction:
+    bypass_mode: true
+
+reward:
+  custom_reward_function:
+    path: pkg://examples.blackbox_recipes.mini_swe_agent.reward
+    name: compute_score
+
+trainer:
+  nnodes: 1
+  n_gpus_per_node: 4
+  total_epochs: 10
+  total_training_steps: null
+  project_name: swe_agent_blackbox
+  experiment_name: swe_agent
+  logger:
+    - console
+  device: cuda
+  val_before_train: true
+  val_only: false
+  save_freq: 10
+  test_freq: 10
+  default_local_dir: checkpoints/swe_agent_blackbox
+  resume_mode: auto
+  use_v1: true
+  v1:
+    trainer_mode: separate_async
+    colocate_async:
+      num_warmup_batches: 1
+    separate_async:
+      num_warmup_batches: 4
+      parameter_sync_step: 4
+
+transfer_queue:
+  enable: true
+
+ray_kwargs:
+  ray_init:
+    runtime_env:
+      env_vars:
+        TRANSFER_QUEUE_ENABLE: ""
+        NCCL_P2P_DISABLE: "1"
+        NCCL_SHM_DISABLE: "1"