Feat/sandbox #2072
Open
huanghuoguoguo wants to merge 131 commits into
Open
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
6007b79 to
726da24
Compare
726da24 to
1f958e8
Compare
0d18fae to
fa74c75
Compare
…uncation
- Implement head+tail output truncation (60/40 split) so LLM sees both
beginning and final results; add streaming byte-limited reads in backend
to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
- Define BoxProfile model with locked fields and max_timeout_sec clamping
- Add four built-in profiles: default, offline_readonly, network_basic,
network_extended with differentiated resource and security constraints
- Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
read_only_rootfs) and pass corresponding container CLI flags
(--cpus, --memory, --pids-limit, --read-only, --tmpfs)
- Profile loaded from config (box.profile), applied in service layer
before BoxSpec validation; locked fields cannot be overridden by
tool-call parameters
…kill cache The Box backends behave inconsistently when extra_mounts reference a missing host directory (nsjail aborts the entire sandbox start, Docker silently creates a root-owned empty dir on the host, E2B silently skips the upload). The cache in skill_mgr.skills is only refreshed on in-process mutations, so out-of-band changes — container rebuilds, manual rm in the box volume, anything the LangBot API didn't drive — leave a stale skill that later produces one of those bad mount paths. - box/service.py: build_skill_extra_mounts now filters skills whose package_root is not isdir on the LangBot-visible filesystem and logs a warning, instead of passing the bad mount through to the backend - skill/manager.py: reload_skills (Box path) drops skills whose package_root is missing on the LangBot-side filesystem before they reach the in-memory cache, with a summary warning - api/http/controller/groups/skills.py: file/CRUD handlers now also catch BoxError (RuntimeError subclass, previously slipping past ``except ValueError`` and surfacing as 500); list/get handlers gain a try/except so a transient Box RPC failure becomes a clean 400 instead of a stack trace Tests added for build_skill_extra_mounts (skip missing, skip empty, no skill manager) and SkillManager.reload_skills (drop missing on Box path). Full unit suite: 279 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make the Box sandbox runtime optional. When ``box.enabled`` is false in
config (or when an enabled Box fails to connect), every dependent feature
degrades to the same disabled-state UX rather than crashing or silently
falling back to less safe code paths.
Backend:
- config.yaml: new top-level ``box.enabled: true`` flag (default true)
- BoxService:
- Read box.enabled on construction
- initialize() short-circuits when disabled — no remote WS connect, no
stdio subprocess fork
- _on_runtime_disconnect is a no-op when disabled (no reconnect loop
on a deliberately-off service)
- get_status() now exposes ``enabled`` so the frontend can tell
"disabled in config" from "configured but failed"
- MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to
be available, not just installed
- MCP _init_stdio_python_server: when ap.box_service exists but is
unavailable, refuse the stdio server with an actionable error instead
of silently falling through to host-stdio (which bypasses the sandbox
the operator asked for). Setups without ap.box_service installed at
all keep the legacy host-stdio fallback for pre-Box dev mode
- SkillService._require_box_for_write: refuses create/update/install/
write_skill_file when ap.box_service is installed but unavailable.
Distinguishes disabled vs failed in the error message so the UI can
surface the right hint. Legacy setups (no ap.box_service) keep the
local fallback path — that distinction is what keeps the existing
local-skills tests valid
Tests:
- Box disabled-state behavior (4 cases)
- Skill write refusal in disabled & failed states (7 cases)
- MCP stdio runtime info policy updated to match new refuse-when-down
behavior
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When Box is disabled in config (``box.enabled = false``) or fails to connect, every dependent UI surface now degrades visibly: - ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``, ``disabled`` (config-off) and a single ``hint`` key so callers don't have to re-derive the three states - ``BoxUnavailableNotice`` reusable Alert banner driven by that hint - Dashboard SystemStatusCards: three-state dot + label (connected / disabled-gray / disconnected-red); disabled state shows the ``boxDisabled`` hint, failed state continues to show the connector error. Plugin block kept untouched - Skills page (create view) and SkillDetailContent (edit view): Save button disabled and banner inserted above the form when Box is unavailable — matches the backend gate added in the previous commit - PipelineExtension skill section: ``enable_all_skills`` switch, Add Skill button and Remove buttons all gate on Box availability; banner inline under the section header - PipelineFormComponent: banner above the ``local-agent`` stage card when Box is unavailable, since that stage carries the sandbox-bound ``box-session-id-template`` field - Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale files updated with ``boxDisabled`` / ``boxUnavailable`` / ``boxRequiredHint`` strings Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docker-compose: move ``langbot_box`` under compose profiles
(``box`` and ``all``) so ``docker compose up`` no longer requires
the sandbox container. Inline comment explains how to pair the
profile choice with ``box.enabled`` so the langbot service does not
thrash trying to reach a runtime that was never started
- docs/review/box-architecture.md:
- Annotate ``box.enabled`` in the config.yaml example, listing the
exact side effects (no remote/stdio connect; tools/skills/MCP
stdio off; reads still work)
- Replace the bare compose snippet with the actual profile-driven
invocation and the BOX__ENABLED pairing
- New "关闭/连接失败时的行为矩阵" section: a single table mapping
every consumer (native tools, activate/register_skill, stdio MCP,
skill list/CRUD, pipeline AI config, extensions page, dashboard)
to its disabled-state behavior, plus the legacy ``ap.box_service``
distinguisher note
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tooltip
The previous commit hard-coded a BoxUnavailableNotice banner above the
``local-agent`` stage card. That works, but it shouts at the user about
every field in that stage when in reality only one field —
``box-session-id-template`` — depends on the sandbox.
Use the dynamic-form schema's existing variable-injection mechanism
(``__system.*`` references via ``systemContext``) and add a sibling to
``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays
visible, becomes inert, and an info icon next to its label exposes the
reason on hover. The rest of the AI tab is left untouched.
- entities/form/dynamic.ts: extend IDynamicFormItemSchema with
``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject``
- DynamicFormComponent: evaluate disable_if with the same resolver as
show_if; OR the result into isFieldDisabled; render an Info tooltip
trigger next to the label when the condition matches
- ai.yaml metadata: attach disable_if (__system.box_available eq false)
and a localized disabled_tooltip to box-session-id-template
- PipelineFormComponent: drop the BoxUnavailableNotice import and the
per-stage banner; pass ``systemContext={ box_available: boxAvailable }``
only for the local-agent stage so other stages aren't paying the
re-render cost
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the MCP detail dialog dumped the raw RuntimeError text from
``_init_stdio_python_server`` — English-only, prefixed with "Failed
after 4 attempts", and exposing internal config names. The retry
wrapper also kept retrying a refusal that is deterministically going
to fail again, polluting logs.
Replace the raw text with a structured signal:
- New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio
refusal path sets it before raising and uses a short opaque
discriminator (``box_disabled_in_config`` / ``box_unavailable``) as
the message body — never user-facing
- ``_lifecycle_loop_with_retry`` short-circuits on
``BOX_UNAVAILABLE``: surfaces the error immediately, no retries,
no "Failed after N attempts" prefix. Silences the warning storm
seen during smoke-testing
- ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``,
``retry_count``, ``box_session_id``, ``box_enabled`` to match what
the backend already returns in get_runtime_info_dict()
- Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and
``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect
``error_phase === 'box_unavailable'`` and render a two-line
localized notice: state line ("Box disabled / unreachable") plus
remediation line ("enable Box or switch to http/sse")
- 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get
``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``,
``mcp.boxStdioRefusedSuggestion``
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ilable When Box is disabled in config (``box.enabled = false``) or unreachable, saving a new MCP server in stdio mode produced one that could never start — the user would only learn that from the runtime error on the detail page. Stop the user before they save instead. Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog ``MCPFormDialog.tsx``) now: - Disable the ``stdio`` option in the mode select when Box is unavailable, with a small "(requires Box)" suffix so the reason is obvious. Existing stdio configs still display their current value - Show ``BoxUnavailableNotice`` inline under the mode select when the currently-selected mode is stdio and Box is unavailable, so editing a stale stdio config makes the cause visible - Disable the Save / Submit button while stdio is selected under that condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop so the parent ``MCPDetailContent`` can disable both its Submit and Save buttons. ``MCPFormDialog`` disables its Save button locally - Refuse the submit handler too (Enter-key path) with a toast carrying the same i18n message i18n: ``mcp.boxRequired`` (short tag in the disabled option) and ``mcp.stdioBlockedByBoxToast`` added to all 8 locales. Backend runtime gate (``_init_stdio_python_server`` refusal + ``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place as the last line of defence for API bypass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le source Skills now flow exclusively through the Box runtime. Every read and write method funnels through ``_box_service()``; when Box is unavailable (disabled in config, connection failed, or simply not installed) the operation either returns an empty surface (``list_skills`` → []) or raises with a clear ``Box runtime ... not initialised / disabled / unavailable: ...`` message via the new ``_require_box(action)`` helper. Why: the legacy local-fallback path scanned ``data/skills/``, but Box manages its own ``box.local.skills_root`` (default ``data/box/skills/``). The two diverging directories caused stale / phantom skill lists when Box flapped, and the local-fallback writes silently bypassed all the sandboxing the operator had configured. SkillService (``api/http/service/skill.py``): - New ``_require_box(action)`` returns the box service or raises a structured ValueError. ``_require_box_for_write`` kept as alias - ``list_skills`` → returns [] when Box is down so the UI can render the disabled banner cleanly - ``get_skill`` / ``get_skill_by_name`` → return None - All read-file / write-file / scan-dir / create / update / delete / install / preview methods → ``_require_box`` then box delegate. Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview pipelines) removed entirely SkillManager (``pkg/skill/manager.py``): - ``reload_skills`` returns early with empty cache when Box is down. data/skills/ discovery loop removed - ``refresh_skill_from_disk`` now just reports cache presence; the on-disk re-parse is gone since Box is the only writer Tests: - Drop 11 obsolete test_skill_service.py tests that exercised the removed local-fallback paths (create/install/file/delete/update) - Add list-empty + read-refused tests; flip the legacy-allow test to legacy-refuses-too - Rewrite refresh_skill_from_disk test to match the new behaviour Several helper methods (_managed_skill_path, _resolve_skill_path, _preview_skill_candidates, _install_preview_candidates, etc.) are now unreachable; a follow-up commit will prune them so this diff stays reviewable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…migration
Follow-up to the Box-only refactor. The previous commit removed the
local-fallback BRANCHES from every public method; this one removes the
HELPERS those branches called, which are now unreachable.
SkillService (service/skill.py): 787 → 449 lines
Removed: scan_directory (sync), _read_skill_package, _write_skill_md,
_resolve_create_field, _managed_skill_path,
_managed_install_root_for_package, _normalize_package_root,
_resolve_skill_path, _find_skill_entry, _discover_skill_directories,
_safe_extract_zip, _extract_uploaded_skill_to_temp,
_download_github_skill_to_temp, _resolve_github_source_root,
_build_preview_target_dir, _preview_skill_candidates,
_select_preview_candidates, _install_preview_candidates,
_preview_source_root, _resolve_installed_skills, plus the
module-level _FRONTMATTER_FIELDS and _build_skill_md.
Kept (still needed by the surviving GitHub-import path):
_download_github_asset, _download_github_skill_directory_as_zip,
_find_github_skill_archive_entry, _copy_github_skill_directory_to_zip,
_is_github_skill_md_url, _parse_github_skill_md_url,
_resolve_github_skill_md_package_name, _validate_github_asset_url,
_uploaded_skill_target_stem, _validate_skill_name.
Imports dropped: shutil, tempfile, yaml, ....utils.paths.
SkillManager (skill/manager.py): 187 → 88 lines
Removed: get_managed_skills_root, _discover_skill_directories,
_find_skill_entry, _load_skill_file, _normalize_package_root.
Imports dropped: datetime, parse_frontmatter, paths.
Tests:
- test_skill_service.py: drop the 3 sync scan_directory tests +
skill_service fixture + _create_skill_file helper
- test_skill_tools.py: drop test_load_skill_file_success; rename
TestSkillManagerPackageLoading → TestSkillManagerCache
Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| return | ||
| result = reload_skills() | ||
| if inspect.isawaitable(result): | ||
| await result |
The contributor's original PR (#1917) appended an ``Available Skills`` index to the system prompt before the LLM saw the user message, so the LLM could decide whether to activate a skill. ``7145447b`` removed the text-marker activation flow and, together with it, the entire system prompt injection — but the Tool Call replacement only put the available skills inside the ``activate`` tool's description. In practice the LLM ignores tool descriptions for selection and goes straight to native tools, so user-visible skill activation silently broke. Restore the injection, adapted for the Tool Call era: - SkillManager regains ``get_skill_index(bound_skills)`` and ``build_skill_aware_prompt_addition(bound_skills)``. The addendum carries only ``name (display_name): description`` for each pipeline-visible skill plus one instruction line pointing at the ``activate`` tool. No SKILL.md contents — KV cache stays clean - PreProcessor appends the addendum to the first system message (or inserts a new one) of ``query.prompt.messages`` for the local-agent runner. Handles plain-string and ContentElement[] bodies. Skips cleanly when no skills are visible - 3 new test_preproc cases: injection happens, bound-skills subset honoured, empty addendum touches nothing. 280 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Until now ``BoxService.get_status`` returned ``available: true`` whenever
the runtime connector was healthy, even if the runtime itself reported
``backend: { available: false }`` (operator selected nsjail without the
binary, Docker daemon crashed mid-session, E2B credentials wrong, ...).
The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the
top-level flag and showed "connected" while every actual call to native
exec or skill management would fail.
The native-tool loader already polled ``status.backend.available``
independently and hid its tools correctly, but every other consumer
(dashboard banner, the disabled-state hint, the LLM-facing message)
disagreed with it.
Combine the two in the payload: ``available = self._available AND
status.backend.available``. When ``backend.available`` is false we now
also surface a ``connector_error`` that names the backend
("Configured sandbox backend \"nsjail\" is unavailable") so the dialog
shows the actionable reason instead of an empty error pane. The
detailed ``backend`` object is preserved unchanged for the dialog.
Internal ``box_service.available`` (used by ``skill_service`` writes,
``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally
NOT changed — it still tracks connector health only, so a backend blip
does not trigger spurious reconnect loops.
Tests:
- ``test_get_status_downgrades_available_when_backend_dead`` — exercise
the new branch (connector OK, backend.available=false → top-level
available=false, connector_error mentions the backend name)
- ``test_get_status_keeps_available_true_when_backend_ok`` — guard
against regressing the happy path
Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary):
``GET /api/v1/box/status`` now returns ``available: false`` with the
named connector_error, instead of the previous misleading
``available: true``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When Box is configured but the runtime reports its backend is dead
(e.g. ``box.backend = nsjail`` but the binary is missing, or Docker
daemon crashed), the backend now returns a structured
``connector_error`` like ``Configured sandbox backend "nsjail" is
unavailable``. The previous notice only said "Box sandbox is
unavailable" + a generic "enable Box" hint, hiding the actionable
detail.
- ``useBoxStatus``: derive ``reason`` from ``status.connector_error``.
Only exposed for the failed-state (``hint === 'boxUnavailable'``),
since the disabled-by-config message already carries its reason
- ``BoxUnavailableNotice``: insert the reason as a small monospaced
line between the state message and the action hint. The disabled
variant is unchanged (operator chose the state)
- Wire ``reason`` through every existing call site (Skills page +
detail, PipelineExtension, both MCP forms). Old unused ``context``
prop dropped
Net layout (3 lines, still compact):
⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ...
Configured sandbox backend "nsjail" is unavailable
This feature requires the Box runtime. Enable it in config ...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflicts in: - .github/workflows/run-tests.yml: keep master's src/langbot/** paths plus feat/** push branch - src/langbot/pkg/plugin/connector.py: keep both branches' marketplace MCP/skill install logic (HEAD) and runtime/wait helpers (master); add missing return in _inspect_plugin_package so LOCAL/GITHUB install paths get author/name back - tests/unit_tests/pipeline/test_n8nsvapi.py: keep HEAD's try/finally sys.modules save/restore pattern - web/src/app/home/components/dynamic-form/DynamicFormComponent.tsx: union imports + keep HEAD's disable_if/tooltip support and master's QrCodeLoginDialog - web/src/i18n/locales/*: union of disjoint top-level keys from both branches - web/src/app/home/market/page.tsx: accept our deletion (unified extensions page) - uv.lock: regenerate via uv sync --dev
The merge from master brought in new unit tests that target pre-refactor APIs on feat/sandbox. Reconcile each: - factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills dict + inert prompt-addition builder) and a Mock pipeline_service so the PreProcessor skill-index injection branch can run end-to-end in tests. - pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so pipeline.stage is fully initialised before any individual stage test (preproc, longtext, ...) tries to lazy-load it. Without this preload, running test_preproc.py in isolation hit a circular-import error via the stage -> app -> pipelinemgr -> stage chain. - provider/test_tool_manager.py: ToolManager now probes four loaders (native -> plugin -> mcp -> skill). Inject inert native + skill mocks in the execute_func_call fixture and assert all four shutdowns fire. - utils/test_paths.py: drop the three cwd-dependent _check_if_source_install cases. The refactor walks Path(__file__).resolve().parents looking for pyproject.toml + main.py, so cwd no longer factors in and there's no file read to mock-fail. The positive case and caching test still apply. - utils/test_version.py: delete entirely. is_newer and compare_version_str were removed when VersionManager was refactored to use the Space API for release checks (1b4107a); the tests targeted a surface that no longer exists.
| # any individual stage test (e.g. preproc, longtext) tries to import it. Without | ||
| # this, running a stage test in isolation triggers a circular-import error: | ||
| # stage.py → core.app → pipelinemgr → stage.stage_class (not yet bound). | ||
| import langbot.pkg.pipeline.pipelinemgr # noqa: F401 |
Mirror the plugin runtime: box is now started through the same CLI entry point (langbot_plugin.cli) instead of the box module directly. - docker-compose.yaml: langbot_box command runs `langbot_plugin.cli ... box` (WebSocket is the default transport, no flag needed — matches `rt`). - box/connector.py: both subprocess launch sites (_start_local_stdio and the Windows _start_subprocess_then_ws path) invoke `langbot_plugin.cli.__init__ box`, using `-s` for the stdio transport. - docs/review: update stale `-m langbot_plugin.box[.server]` references. Pairs with the SDK change that removes box's direct-launch entry points (python -m langbot_plugin.box / .box.server) and the legacy --mode flag.
CI on feat/sandbox failed across Unit Tests, Lint and Build Dev Image.
Root causes and fixes:
- pyproject.toml had a [tool.uv.sources] editable override pinning
langbot-plugin to ../langbot-plugin-sdk. That path only exists in a
paired local checkout, so `uv sync` failed on every CI runner
("Distribution not found"). Remove the override and regenerate uv.lock
so langbot-plugin==0.4.0b1 resolves from PyPI, matching master.
- tests/integration/api/test_pipelines.py: the pipeline extensions
endpoint now calls ap.skill_service.list_skills(); add the missing
skill_service mock to the fake_pipeline_app fixture (the test came
from master, the endpoint change from feat/sandbox).
- Apply ruff format to three src files and prettier to three web files
that had committed formatting drift, failing `ruff format --check`
and `pnpm lint`.
This was referenced May 21, 2026
…_bot The dashboard pipeline-debug WebSocket (/api/v1/pipelines/<uuid>/ws/connect) and the embed widget WebSocket (/api/v1/embed/<bot_uuid>/ws/connect) already live on separate paths, but the debug handler ran `_find_owner_bot(pipeline_uuid)` and, when the same pipeline happened to be bound to a web_page_bot, passed that bot as `owner_bot` into `handle_websocket_message`. The adapter then used the page bot's listeners + adapter for the request, so debug sessions were logged as "page bot" activity in the dashboard. Debug sessions must always run under the built-in websocket_proxy_bot. Remove `_find_owner_bot`, drop the `owner_bot` parameter from the debug-path `_handle_receive`, and call `handle_websocket_message` without it so the adapter takes its default proxy-bot branch. The embed handler still resolves and passes its `runtime_bot` for the page-bot path, so attribution there is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
LangBot Box:沙箱执行系统
概述
本 PR 引入 LangBot Box,让 LLM Agent、MCP Server,以及后续的 Skill/工具执行都能在隔离环境中运行 Shell 命令、Python 脚本和长生命周期进程。
当前实现已经不是"代码主要都在 LangBot 主仓"那种结构了,职责现在明确拆成两层:
sandbox_exec工具暴露、Profile/宿主机路径策略、MCP Box-stdio 集成、状态接口,以及运行时连接管理。langbot-plugin-sdk:负责 Box Runtime 底座,包括协议、模型、错误类型、Session 生命周期、Backend 抽象、Docker/Podman/nsjail 执行后端,以及独立运行的 Box Server。换句话说,这个 PR 现在本质上是一个跨仓协作的沙箱能力接入:LangBot 侧负责接入和策略,SDK 侧承载大部分可复用的运行时实现。
分支:
feat/sandbox功能
sandbox_exec原生工具:LLM Agent 获得一个原生工具,可在隔离环境中运行 Shell 命令和 Python 脚本,用于精确计算、结构化解析、临时文件处理和代码执行。stdio模式的 MCP Server 在 Box 可用时会自动运行在沙箱中,支持依赖安装、路径重写和 stdio-over-WebSocket 桥接。Podman、Docker和nsjail三种 Backend,统一走同一套BoxRuntime生命周期管理。/api/v1/box/status、/api/v1/box/sessions、/api/v1/box/errors供运维和调试使用。架构
分层与职责
pkg/box/service.pypkg/box/connector.pyprovider/tools/loaders/native.pysandbox_exec给模型provider/tools/loaders/mcp.pystdioMCP Server 接入 Box Session / managed processapi/http/controller/groups/box.pylangbot_plugin.box.actions/client.pylangbot_plugin.box.modelsBoxSpec、BoxProfile、BoxSessionInfo等共享模型langbot_plugin.box.runtimebackend.py/nsjail_backend.pylangbot_plugin.box.server核心设计决策
1. Runtime 底座下沉到 SDK
现在 Box 的核心不再放在 LangBot 主仓,而是下沉到
langbot-plugin-sdk/src/langbot_plugin/box/。这样做的原因是:LangBot 主仓保留的是产品语义相关能力:是否暴露工具、如何应用 Profile、哪些宿主机路径允许挂载、MCP 如何接入、HTTP 如何观测。
2. 同进程架构
Box Runtime 作为 LangBot 的子进程运行,通过 stdio 与 LangBot 主进程通信。无论本地开发还是 Docker 部署,行为一致:
BoxRuntimeConnector启动python -m langbot_plugin.box.server --port 5410子进程,并用 stdio 建立连接。docker.sock即可,Box Runtime 子进程直接访问宿主 Docker 引擎。如需将 Box Runtime 部署到独立主机,可在
config.yaml中显式配置runtime_url,此时 LangBot 通过 WebSocket 连接远程 Runtime。3. Session 复用
Session 是 Box 的核心调度单元。
BoxRuntime维护一个session_id -> RuntimeSession映射:sandbox_exec默认以query_id作为session_idmcp-{uuid}形式持有独立 SessionSession 带 TTL(默认 300 秒)。回收条件是:
last_used_at超过 TTL这保证了:
sandbox_exec可以在同一次对话里做多步有状态执行4. Profile 体系在 LangBot 层生效
sandbox_exec不直接把所有隔离参数完全裸露给模型,而是先通过 LangBot 的BoxService应用 Profile:timeout_sec会被 clamp 到profile.max_timeout_sec当前内置 Profile 仍包括:
defaultoffline_readonlynetwork_basicnetwork_extended5. Backend 抽象与探测顺序
SDK 里的
BoxRuntime现在统一从以下顺序探测可用 Backend:PodmanBackendDockerBackendNsjailBackend三者都实现同一套
BaseSandboxBackend接口,上层BoxService/BoxRuntimeConnector/ActionRPCBoxClient都不感知底层具体是容器还是 nsjail。6. MCP Box-stdio 模式
LangBot 中的
RuntimeMCPSession在检测到stdioMCP 且 Box 可用时,会执行下面这条链路:BoxService.create_session()创建 Sessionpyproject.toml/requirements.txt自动安装依赖/workspace/...start_managed_process()启动 MCP 进程MCP 协议语义仍然在 LangBot 侧,SDK 里的 Box Runtime 只负责"把一个托管进程安全地跑起来并提供 attach 能力"。
7. Host Path 挂载
Box 把宿主机目录挂载到沙箱内固定的
/workspace:sandbox_exec:默认取config.yaml中的box.default_host_workspacebox.host_pathDocker 部署下,LangBot 容器挂载宿主机目录(如
./data/box:/workspaces),Box Runtime 子进程运行在同一容器内,直接访问该挂载目录并据此创建实际容器挂载。LangBot 侧负责路径白名单校验。核心接口
LangBot:
BoxServiceSDK:
BoxSpecSDK:
BaseSandboxBackend通信方式
Action RPC
Box 复用
langbot_plugin.runtime.io这一套 Action RPC / Connection / Handler 基础设施。当前 Box Runtime 暴露的动作包括:box_healthbox_statusbox_execbox_create_sessionbox_get_sessionbox_get_sessionsbox_delete_sessionbox_start_managed_processbox_get_managed_processbox_get_backend_infobox_shutdown传输模式
langbot_plugin.box.server子进程并通过 stdio 通信runtime_url的远程部署ws://<remote-host>:5411WebSocket Relay
Box Runtime 还会在
:5410起一个轻量 aiohttp 服务,用于 MCP 托管进程 attach:GET /v1/sessions/{session_id}/managed-process/ws该接口负责把 WebSocket 文本消息桥接到托管进程的 stdin/stdout。
部署方式
本地开发
无需额外服务编排。LangBot 会自动启动本地 Box Runtime 子进程。
宿主机需要具备至少一种可用后端:
Podman、Docker或nsjail。Docker Compose
Box Runtime 作为子进程运行在 LangBot 容器内,无需单独容器。LangBot 容器需挂载容器运行时 socket:
LangBot 启动时自动拉起 Box Runtime 子进程,通过 stdio 通信,通过
http://127.0.0.1:5410访问 managed-process relay。远程部署(可选)
如需将 Box Runtime 部署到独立主机,可在
config.yaml中配置runtime_url:此时 LangBot 通过 WebSocket 连接远程 Runtime,不再启动本地子进程。
安全模型
/etc、/proc、/sys、/dev、/root、/boot、容器运行时 socket 等路径被硬编码阻断。Windows 环境额外阻断C:\Windows、C:\Program Files等系统路径。allowed_host_mount_roots下的路径才允许挂载到/workspace。nsjailBackend 也固定以只读系统挂载为核心模型。langbot.box=true容器。Skill / 插件如何接入
1. 通过
sandbox_exec最简单的接入方式仍然是把
sandbox_exec放进模型工具列表,让模型在需要时自行调用。2. 直接调用
BoxService适合插件、Skill 或平台内部逻辑明确需要执行固定命令的场景:
3. MCP Server in Box
stdioMCP Server 在 Box 可用时自动运行在沙箱内,并支持通过box字段覆盖镜像、网络、挂载模式、启动超时等参数:{ "name": "my-mcp-server", "mode": "stdio", "command": "python", "args": ["server.py"], "box": { "image": "node:20", "network": "on", "host_path_mode": "ro", "startup_timeout_sec": 180 } }文件结构
LangBot 主仓
langbot-plugin-sdk部署与测试
测试覆盖
BoxService、BoxRuntimeConnector、sandbox_exec接入、MCP Box 配置与路径改写等逻辑。nsjailBackend 的探测、执行、Session 清理与隔离行为。Q&A
Q: Profile 是全局的吗?模型能覆盖哪些参数?
是全局配置,来源于
config.yaml的box.profile。未锁定字段可被模型覆盖;锁定字段始终回退到 Profile 值。Q: MCP Server 为什么不走 Profile?
因为 MCP Server 是管理员显式配置的可信进程,需求和 LLM 生成代码不同。它默认需要更高可用性,比如联网安装依赖,所以走
MCPServerBoxConfig独立配置。Q: Session TTL 会不会把 MCP Server 提前清掉?
不会。只要 Session 上还有运行中的 managed process,TTL 回收逻辑就会跳过它。
Q: 现在没有 Docker / Podman 怎么办?
Runtime 会按
Podman -> Docker -> nsjail的顺序探测可用 Backend。三者都没有时,BoxService.available = False,sandbox_exec不会暴露给模型,stdioMCP 也会回退到宿主机直接运行。Q:
nsjail现在是什么状态?已经接入当前代码路径,不再只是规划。它是
BoxRuntime的正式候选 Backend 之一,只是在实际部署中是否命中它,取决于宿主机上是否安装并可用。Q: 如何接入新的 Backend?
实现
BaseSandboxBackend接口并加入BoxRuntime.backends探测列表即可。LangBot 集成层、Action RPC 协议、工具定义都不需要改。Q: 为什么 Box Runtime 不需要独立容器?
Box Runtime 进程本身只是一个纯调度进程:通过 docker socket 或 nsjail 命令创建和管理沙箱,不执行任何用户代码,也不直接操作文件系统。与 Plugin Runtime 不同(插件会直接操作文件系统、安装依赖、运行第三方代码),Box Runtime 没有隔离需求,作为子进程运行在 LangBot 容器内更简单,也避免了跨容器的路径映射和网络跳转。
Q: Windows 支持情况?
Windows 平台仅支持 Docker 后端(通过 Docker Desktop)。Podman 和 nsjail 依赖 Linux 内核特性(namespace、cgroups 等),仅限 Linux 环境使用。