[examples] feat: blackbox mini-swe-agent training recipe#73
[examples] feat: blackbox mini-swe-agent training recipe#73zhaizhiqiangA wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a blackbox recipe for running mini_swe_agent inside an OpenYuanRong remote sandbox, including Dockerfiles, configurations, dataset and reward utilities, and runner scripts. Feedback focuses on improving robustness and preventing resource leaks, such as reordering task configuration before sandbox creation to avoid leaking sandboxes on error, shell-quoting file paths, wrapping blocking network calls in asyncio.to_thread, handling missing ports and preserving query parameters in gateway URLs, adding defensive type checks to prevent AttributeError or TypeError on missing/invalid dictionaries, and using ${BASH_SOURCE[0]} for robust script directory resolution.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| sandbox = await YRSandbox.create( | ||
| image=image, sidecar_image=tool_image, upstream=upstream, max_retries=int(sandbox_max_retries), | ||
| ) | ||
| sandbox_id = sandbox.sandbox_id | ||
| logger.info("Sandbox created (image=%s, sandbox_id=%s)", image, sandbox_id) | ||
|
|
||
| # Build task config (gateway URL rewritten to sandbox-internal tunnel) | ||
| task_config = _build_task_config( | ||
| task=task, | ||
| gateway_url=gateway_url, | ||
| ) | ||
|
|
||
| try: |
There was a problem hiding this comment.
If _build_task_config raises an exception (e.g., due to invalid environment variables or URL parsing issues), the remote sandbox created by YRSandbox.create will be leaked because the exception is raised before entering the try...finally block. To prevent resource leaks, execute _build_task_config before creating the sandbox.
| sandbox = await YRSandbox.create( | |
| image=image, sidecar_image=tool_image, upstream=upstream, max_retries=int(sandbox_max_retries), | |
| ) | |
| sandbox_id = sandbox.sandbox_id | |
| logger.info("Sandbox created (image=%s, sandbox_id=%s)", image, sandbox_id) | |
| # Build task config (gateway URL rewritten to sandbox-internal tunnel) | |
| task_config = _build_task_config( | |
| task=task, | |
| gateway_url=gateway_url, | |
| ) | |
| try: | |
| # Build task config (gateway URL rewritten to sandbox-internal tunnel) | |
| task_config = _build_task_config( | |
| task=task, | |
| gateway_url=gateway_url, | |
| ) | |
| sandbox = await YRSandbox.create( | |
| image=image, sidecar_image=tool_image, upstream=upstream, max_retries=int(sandbox_max_retries), | |
| ) | |
| sandbox_id = sandbox.sandbox_id | |
| logger.info("Sandbox created (image=%s, sandbox_id=%s)", image, sandbox_id) | |
| try: |
| async def write_file(self, path: str | Path, content: str) -> None: | ||
| encoded = base64.b64encode(content.encode()).decode() | ||
| await self.communicate(f"echo {encoded} | base64 -d > {path}", check="raise", error_msg=f"write {path}") | ||
|
|
||
| async def read_file(self, path: str | Path, **_) -> str: | ||
| return await self.communicate(f"cat {path}") |
There was a problem hiding this comment.
If the file path contains spaces or special shell characters, the commands executed via communicate will fail or behave unexpectedly because the path is not shell-quoted. Use shlex.quote to safely escape the path.
| async def write_file(self, path: str | Path, content: str) -> None: | |
| encoded = base64.b64encode(content.encode()).decode() | |
| await self.communicate(f"echo {encoded} | base64 -d > {path}", check="raise", error_msg=f"write {path}") | |
| async def read_file(self, path: str | Path, **_) -> str: | |
| return await self.communicate(f"cat {path}") | |
| async def write_file(self, path: str | Path, content: str) -> None: | |
| encoded = base64.b64encode(content.encode()).decode() | |
| await self.communicate(f"echo {encoded} | base64 -d > {shlex.quote(str(path))}", check="raise", error_msg=f"write {path}") | |
| async def read_file(self, path: str | Path, **_) -> str: | |
| return await self.communicate(f"cat {shlex.quote(str(path))}") |
| if self._sandbox.is_running(): | ||
| await asyncio.to_thread(self._sandbox.kill) | ||
| logger.info("YR sandbox %s killed", sandbox_id) | ||
| else: | ||
| logger.info("YR sandbox %s already stopped", sandbox_id) |
There was a problem hiding this comment.
self._sandbox.is_running() is a synchronous blocking network call to the remote sandbox SDK. Calling it directly in an async def function blocks the event loop. Wrap it in asyncio.to_thread to prevent blocking the main thread.
| if self._sandbox.is_running(): | |
| await asyncio.to_thread(self._sandbox.kill) | |
| logger.info("YR sandbox %s killed", sandbox_id) | |
| else: | |
| logger.info("YR sandbox %s already stopped", sandbox_id) | |
| is_running = await asyncio.to_thread(self._sandbox.is_running) | |
| if is_running: | |
| await asyncio.to_thread(self._sandbox.kill) | |
| logger.info("YR sandbox %s killed", sandbox_id) | |
| else: | |
| logger.info("YR sandbox %s already stopped", sandbox_id) |
| def extract_upstream(gateway_url: str) -> str: | ||
| """Extract host:port from a gateway URL for upstream tunnel config. | ||
|
|
||
| Example: "http://8.92.9.155:40169/sessions/abc/v1" -> "8.92.9.155:40169" | ||
| """ | ||
| parsed = urlparse(gateway_url) | ||
| return f"{parsed.hostname}:{parsed.port}" |
There was a problem hiding this comment.
If the gateway_url does not specify an explicit port (e.g., standard http or https URLs), parsed.port will be None, resulting in an invalid upstream string like host:None. Handle missing ports by defaulting to 80 for http and 443 for https.
def extract_upstream(gateway_url: str) -> str:
"""Extract host:port from a gateway URL for upstream tunnel config.
Example: "http://8.92.9.155:40169/sessions/abc/v1" -> "8.92.9.155:40169"
"""
parsed = urlparse(gateway_url)
hostname = parsed.hostname or ""
port = parsed.port
if port is None:
port = 443 if parsed.scheme == "https" else 80
return f"{hostname}:{port}"| parsed = urlparse(gateway_url) | ||
| path = parsed.path.removesuffix("/v1") if strip_v1 else parsed.path | ||
| return f"http://127.0.0.1:{proxy_port}{path}" |
There was a problem hiding this comment.
Rebuilding the gateway URL using only parsed.path discards any query parameters or fragments present in the original URL. Preserve them by appending parsed.query and parsed.fragment to the rewritten URL.
| parsed = urlparse(gateway_url) | |
| path = parsed.path.removesuffix("/v1") if strip_v1 else parsed.path | |
| return f"http://127.0.0.1:{proxy_port}{path}" | |
| parsed = urlparse(gateway_url) | |
| path = parsed.path.removesuffix("/v1") if strip_v1 else parsed.path | |
| query = f"?{parsed.query}" if parsed.query else "" | |
| fragment = f"#{parsed.fragment}" if parsed.fragment else "" | |
| return f"http://127.0.0.1:{proxy_port}{path}{query}{fragment}" |
| image = env_config.get("image") | ||
| if image: | ||
| return image | ||
| deployment = env_config.get("deployment") |
There was a problem hiding this comment.
If env_config is None or not a dictionary, calling env_config.get will raise an AttributeError. Add a type check to handle non-dictionary inputs gracefully.
if not isinstance(env_config, dict):
return ""
image = env_config.get("image")
if image:
return image
deployment = env_config.get("deployment")| extra_info = row_dict.get("extra_info", {}) | ||
| tools_kwargs = extra_info.get("tools_kwargs", {}) | ||
| reward_config = tools_kwargs.get("reward", {}) | ||
|
|
||
| row_dict.setdefault("data_source", reward_config.get("name", "unknown")) | ||
| row_dict.setdefault("reward_model", {"ground_truth": {}}) |
There was a problem hiding this comment.
If extra_info or tools_kwargs is missing or not a dictionary, calling .get on them will raise an AttributeError. Use defensive type checks to ensure robustness.
| extra_info = row_dict.get("extra_info", {}) | |
| tools_kwargs = extra_info.get("tools_kwargs", {}) | |
| reward_config = tools_kwargs.get("reward", {}) | |
| row_dict.setdefault("data_source", reward_config.get("name", "unknown")) | |
| row_dict.setdefault("reward_model", {"ground_truth": {}}) | |
| extra_info = row_dict.get("extra_info") or {} | |
| tools_kwargs = extra_info.get("tools_kwargs") or {} if isinstance(extra_info, dict) else {} | |
| reward_config = tools_kwargs.get("reward") or {} if isinstance(tools_kwargs, dict) else {} | |
| row_dict.setdefault("data_source", reward_config.get("name", "unknown") if isinstance(reward_config, dict) else "unknown") | |
| row_dict.setdefault("reward_model", {"ground_truth": {}}) |
| if extra_info and "reward_score" in extra_info: | ||
| score = float(extra_info["reward_score"]) |
There was a problem hiding this comment.
If extra_info is not a dictionary (e.g., if it is None or another type), checking "reward_score" in extra_info can raise a TypeError. Ensure extra_info is a dictionary before performing the membership check.
| if extra_info and "reward_score" in extra_info: | |
| score = float(extra_info["reward_score"]) | |
| if isinstance(extra_info, dict) and "reward_score" in extra_info: | |
| score = float(extra_info["reward_score"]) |
| # | ||
| set -euo pipefail | ||
|
|
||
| SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" |
There was a problem hiding this comment.
Using dirname "$0" can be unreliable if the script is sourced or executed via certain shell interpreters. Using dirname "${BASH_SOURCE[0]}" is more robust and consistent with the other scripts in this repository (e.g., run_train.sh).
| SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" | |
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" |
ded27bf to
042dc66
Compare
042dc66 to
9235bf6
Compare
What does this PR do?
This PR adds a blackbox RL training recipe for mini-swe-agent under examples/blackbox_recipes/. The agent runs entirely inside an remote sandbox via a sidecar tool-image mount: the host-side runner creates the sandbox, pipes the task config to the in-sandbox agent over stdin, parses the result from stdout, and evaluates the reward in the same sandbox. The agent reaches the LLM through the gateway via an upstream tunnel, so training is fully "blackbox" — the trainer only sees prompts/responses through the gateway. Training uses the V1 unified trainer (Megatron backend, GRPO, separate_async).
Related work:
[train] feat: add blackbox agent gateway (#25) — the gateway this recipe runs against
Checklist Before Starting
gh pr list --repo verl-project/uni-agent --state open --search "mini-swe-agent"
No pull requests match your search in verl-project/uni-agent
[examples] feat: blackbox mini-swe-agent training recipeTest
A full RL training recipe is not practical to cover in CI, so validation was manual:
API and Usage Example
This PR only adds files under examples/ plus minor internal import-path updates; there are no public API changes.
Design & Code Changes
New recipe — examples/blackbox_recipes/mini_swe_agent/
mini_swe_agent_runner.py — host-side runner. Creates a YRSandbox with the sidecar mounted at /opt/mini-swe-agent, base64-encodes the task config (task text + tunnel-rewritten gateway URL + step limit) and pipes it to run_agent.py via stdin, parses the JSON result from stdout (robust to litellm noise), then evaluates the reward in the same sandbox via SandboxEnvForReward and POSTs reward_info. Sandbox is always cleaned up in finally.
run_agent.py — in-sandbox entrypoint. Builds a LocalEnvironment + LitellmModel (pointed at the gateway tunnel) + DefaultAgent from mini-swe-agent's SWE-bench defaults, runs the task, emits a result JSON.
Dockerfile.mini-swe-agent-tool — self-contained, glibc-portable sidecar image (FROM scratch) so the sandbox base image needs no Python/Node.
dataset.py (SWEBenchDataset) injects verl-standard reward fields; reward.py reuses the uni_agent reward-spec registry to score resolved/unresolved in-env.
config/swe_agent_blackbox_megatron_v1.yaml + scripts/run_train.sh — V1 unified trainer, separate_async by default (4 GPU trainer + 4 GPU rollout on one node), vLLM async rollout, GRPO, Megatron offload.
Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always