Skip to content

Feat/sandbox #2072

Open
huanghuoguoguo wants to merge 131 commits into
masterfrom
feat/sandbox
Open

Feat/sandbox #2072
huanghuoguoguo wants to merge 131 commits into
masterfrom
feat/sandbox

Conversation

@huanghuoguoguo
Copy link
Copy Markdown
Collaborator

@huanghuoguoguo huanghuoguoguo commented Mar 22, 2026

  • host_root 配置在首次部署时的自动生成问题
  • langbot 侧 skill 与 box 的文件系统中的 skill 完整性不符时的挂载操作容错
  • 使 box 机制可选,检查 docker compose 文件、文档、产品内状态检查、需求方(Agent、Skills、stdio MCP)对应表述和适配
  • 验证多种部署模式下的box可用性
  • 合入后的新部署模式说明:可选启用 box、各个系统兼容性问题

Details

LangBot Box:沙箱执行系统

概述

本 PR 引入 LangBot Box,让 LLM Agent、MCP Server,以及后续的 Skill/工具执行都能在隔离环境中运行 Shell 命令、Python 脚本和长生命周期进程。

当前实现已经不是"代码主要都在 LangBot 主仓"那种结构了,职责现在明确拆成两层:

  • LangBot 主仓:负责产品集成层,包括 sandbox_exec 工具暴露、Profile/宿主机路径策略、MCP Box-stdio 集成、状态接口,以及运行时连接管理。
  • langbot-plugin-sdk:负责 Box Runtime 底座,包括协议、模型、错误类型、Session 生命周期、Backend 抽象、Docker/Podman/nsjail 执行后端,以及独立运行的 Box Server。

换句话说,这个 PR 现在本质上是一个跨仓协作的沙箱能力接入:LangBot 侧负责接入和策略,SDK 侧承载大部分可复用的运行时实现。

分支: feat/sandbox


功能

  1. sandbox_exec 原生工具:LLM Agent 获得一个原生工具,可在隔离环境中运行 Shell 命令和 Python 脚本,用于精确计算、结构化解析、临时文件处理和代码执行。
  2. MCP Server 隔离运行(Box-stdio)stdio 模式的 MCP Server 在 Box 可用时会自动运行在沙箱中,支持依赖安装、路径重写和 stdio-over-WebSocket 桥接。
  3. 可配置的安全边界:支持网络开关、CPU/内存/PID 限制、只读根文件系统、宿主机挂载白名单、危险路径阻断。
  4. 可插拔执行后端:当前运行时支持 PodmanDockernsjail 三种 Backend,统一走同一套 BoxRuntime 生命周期管理。
  5. 可观测性接口:LangBot 暴露 /api/v1/box/status/api/v1/box/sessions/api/v1/box/errors 供运维和调试使用。

架构

                         LangBot 主进程
                               │
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
  NativeToolLoader      RuntimeMCPSession       BoxService
  (sandbox_exec)        (MCP Box-stdio)   (策略 / Profile / 校验)
         │                     │                     │
         └─────────────────────┼─────────────────────┘
                               │
                BoxRuntimeConnector (LangBot)
                               │
                ActionRPCBoxClient (SDK)
                               │
               ┌───────────────┴────────────────┐
               │ stdio(默认)                   │ WebSocket(显式配置 runtime_url)
               │ 子进程(本地 & Docker 部署)     │ ws://<remote-host>:5411
               ▼                                 ▼
          langbot_plugin.box.server(SDK 独立服务)
                               │
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
  BoxServerHandler       BoxRuntime            aiohttp WS Relay
  (Action RPC)      (Session / 进程管理)      (:5410, MCP attach)
                               │
                  ┌────────────┼────────────┐
                  │            │            │
             PodmanBackend DockerBackend NsjailBackend
                               │
                  容器 / nsjail 进程隔离环境

分层与职责

仓库 主要模块 职责
集成层 LangBot pkg/box/service.py Profile 应用、宿主机路径校验、输出裁剪、对外 API
连接层 LangBot pkg/box/connector.py 选择 stdio 子进程或远程 WebSocket 连接 Box Runtime
工具接入层 LangBot provider/tools/loaders/native.py 暴露 sandbox_exec 给模型
MCP 集成层 LangBot provider/tools/loaders/mcp.py stdio MCP Server 接入 Box Session / managed process
HTTP 可观测层 LangBot api/http/controller/groups/box.py 暴露状态、Session、错误列表接口
协议层 SDK langbot_plugin.box.actions / client.py Action RPC 动作定义与客户端调用
模型层 SDK langbot_plugin.box.models BoxSpecBoxProfileBoxSessionInfo 等共享模型
运行时层 SDK langbot_plugin.box.runtime Session TTL、复用、managed process 生命周期
Backend 层 SDK backend.py / nsjail_backend.py Docker / Podman / nsjail 执行抽象
服务层 SDK langbot_plugin.box.server 独立运行的 Box Server + MCP WebSocket Relay

核心设计决策

1. Runtime 底座下沉到 SDK

现在 Box 的核心不再放在 LangBot 主仓,而是下沉到 langbot-plugin-sdk/src/langbot_plugin/box/。这样做的原因是:

  • Box Runtime 本身是一个可独立运行的服务,天然更适合放在共享基础设施层。
  • LangBot 和 Box Runtime 复用了 SDK 里现有的 Action RPC / IO 抽象,不需要在主仓重复维护一套协议栈。
  • Box 的模型、错误、客户端、后端探测、独立服务入口都更偏"运行时底座",不应和 LangBot 产品逻辑耦合在一起。

LangBot 主仓保留的是产品语义相关能力:是否暴露工具、如何应用 Profile、哪些宿主机路径允许挂载、MCP 如何接入、HTTP 如何观测。

2. 同进程架构

Box Runtime 作为 LangBot 的子进程运行,通过 stdio 与 LangBot 主进程通信。无论本地开发还是 Docker 部署,行为一致:

  • LangBot 通过 BoxRuntimeConnector 启动 python -m langbot_plugin.box.server --port 5410 子进程,并用 stdio 建立连接。
  • Box Runtime 进程本身只是一个纯调度进程:它通过 docker socket / nsjail 命令创建和管理沙箱,不执行任何用户代码,也不直接操作文件系统。因此不需要像 Plugin Runtime 那样单独容器隔离。
  • Docker 部署时,LangBot 容器挂载 docker.sock 即可,Box Runtime 子进程直接访问宿主 Docker 引擎。

如需将 Box Runtime 部署到独立主机,可在 config.yaml 中显式配置 runtime_url,此时 LangBot 通过 WebSocket 连接远程 Runtime。

3. Session 复用

Session 是 Box 的核心调度单元。BoxRuntime 维护一个 session_id -> RuntimeSession 映射:

  • sandbox_exec 默认以 query_id 作为 session_id
  • MCP Box-stdio 以 mcp-{uuid} 形式持有独立 Session
  • 同一 Session 内的多次执行会复用已有隔离环境,而不是每次重新创建容器 / nsjail 工作目录

Session 带 TTL(默认 300 秒)。回收条件是:

  • last_used_at 超过 TTL
  • 且当前没有运行中的 managed process

这保证了:

  • sandbox_exec 可以在同一次对话里做多步有状态执行
  • MCP Server 不会因为空闲 TTL 被误清理

4. Profile 体系在 LangBot 层生效

sandbox_exec 不直接把所有隔离参数完全裸露给模型,而是先通过 LangBot 的 BoxService 应用 Profile:

  • 未传的字段由 Profile 默认值补齐
  • 被锁定的字段会强制覆盖用户/模型传参
  • timeout_sec 会被 clamp 到 profile.max_timeout_sec

当前内置 Profile 仍包括:

Profile 网络 CPU 内存 根文件系统 挂载 最大超时
default OFF 1.0 512MB 只读 读写 120s
offline_readonly OFF(锁定) 0.5 256MB 只读(锁定) 只读(锁定) 60s
network_basic ON 1.0 512MB 只读 读写 120s
network_extended ON 2.0 1024MB 可写 读写 300s

MCP Box-stdio 不走这套 Profile,而是走 MCPServerBoxConfig 独立配置,因为它的信任模型与 LLM 生成代码不同。

5. Backend 抽象与探测顺序

SDK 里的 BoxRuntime 现在统一从以下顺序探测可用 Backend:

  1. PodmanBackend
  2. DockerBackend
  3. NsjailBackend

三者都实现同一套 BaseSandboxBackend 接口,上层 BoxService / BoxRuntimeConnector / ActionRPCBoxClient 都不感知底层具体是容器还是 nsjail。

6. MCP Box-stdio 模式

LangBot 中的 RuntimeMCPSession 在检测到 stdio MCP 且 Box 可用时,会执行下面这条链路:

  1. BoxService.create_session() 创建 Session
  2. 根据 pyproject.toml / requirements.txt 自动安装依赖
  3. 把宿主机路径改写为容器内 /workspace/...
  4. start_managed_process() 启动 MCP 进程
  5. 通过 Box Runtime 暴露的 WebSocket Relay 连接到该进程的 stdin/stdout
  6. 再由 LangBot 内部 MCP Client 完成协议初始化和工具发现

MCP 协议语义仍然在 LangBot 侧,SDK 里的 Box Runtime 只负责"把一个托管进程安全地跑起来并提供 attach 能力"。

7. Host Path 挂载

Box 把宿主机目录挂载到沙箱内固定的 /workspace

  • sandbox_exec:默认取 config.yaml 中的 box.default_host_workspace
  • MCP Box-stdio:由 LangBot 从 MCP command/args 推断项目根目录,或使用 MCP 配置里的 box.host_path

Docker 部署下,LangBot 容器挂载宿主机目录(如 ./data/box:/workspaces),Box Runtime 子进程运行在同一容器内,直接访问该挂载目录并据此创建实际容器挂载。LangBot 侧负责路径白名单校验。


核心接口

LangBot:BoxService

class BoxService:
    available: bool

    async def execute_sandbox_tool(
        parameters: dict,
        query: Query,
    ) -> dict

    async def execute_skill_tool(
        skill_data: dict,
        tool_def: dict,
        parameters: dict,
        query: Query,
    ) -> dict

    async def create_session(
        spec_payload: dict,
        skip_host_mount_validation: bool = False,
    ) -> dict

    async def start_managed_process(
        session_id: str,
        process_payload: dict,
    ) -> dict

    def get_managed_process_websocket_url(
        session_id: str,
    ) -> str

SDK:BoxSpec

class BoxSpec(pydantic.BaseModel):
    cmd: str = ''
    workdir: str = '/workspace'
    timeout_sec: int = 30
    network: BoxNetworkMode = OFF
    session_id: str
    env: dict[str, str] = {}
    image: str = 'python:3.11-slim'
    host_path: str | None = None
    host_path_mode: BoxHostMountMode = RW
    cpus: float = 1.0
    memory_mb: int = 512
    pids_limit: int = 128
    read_only_rootfs: bool = True

SDK:BaseSandboxBackend

class BaseSandboxBackend(ABC):
    name: str

    async def is_available() -> bool
    async def start_session(spec: BoxSpec) -> BoxSessionInfo
    async def exec(session: BoxSessionInfo, spec: BoxSpec) -> BoxExecutionResult
    async def stop_session(session: BoxSessionInfo) -> None
    async def start_managed_process(session, spec) -> asyncio.subprocess.Process
    async def cleanup_orphaned_containers(instance_id: str) -> None

通信方式

Action RPC

Box 复用 langbot_plugin.runtime.io 这一套 Action RPC / Connection / Handler 基础设施。当前 Box Runtime 暴露的动作包括:

Action 含义
box_health 健康检查
box_status 获取运行时状态
box_exec 在 Session 内执行命令
box_create_session 创建 Session
box_get_session 获取单个 Session
box_get_sessions 获取全部 Session
box_delete_session 删除 Session
box_start_managed_process 启动托管进程
box_get_managed_process 获取托管进程状态
box_get_backend_info 获取当前 Backend 信息
box_shutdown 优雅关闭 Runtime

传输模式

模式 场景 实现
stdio(默认) 本地开发、Docker 部署 LangBot 拉起 langbot_plugin.box.server 子进程并通过 stdio 通信
WebSocket 显式配置 runtime_url 的远程部署 LangBot 连接 ws://<remote-host>:5411

WebSocket Relay

Box Runtime 还会在 :5410 起一个轻量 aiohttp 服务,用于 MCP 托管进程 attach:

  • GET /v1/sessions/{session_id}/managed-process/ws

该接口负责把 WebSocket 文本消息桥接到托管进程的 stdin/stdout。


部署方式

本地开发

无需额外服务编排。LangBot 会自动启动本地 Box Runtime 子进程。

box:
  profile: 'default'
  default_host_workspace: './data/box-workspaces/default'
  allowed_host_mount_roots:
    - './data/box-workspaces'
    - '/tmp'

宿主机需要具备至少一种可用后端:PodmanDockernsjail

Docker Compose

Box Runtime 作为子进程运行在 LangBot 容器内,无需单独容器。LangBot 容器需挂载容器运行时 socket:

services:
  langbot:
    image: rockchin/langbot:latest
    volumes:
      - ./data:/app/data
      - ./data/box:/workspaces
      # Mount container runtime socket for Box sandbox (Docker backend).
      - /var/run/docker.sock:/var/run/docker.sock

LangBot 启动时自动拉起 Box Runtime 子进程,通过 stdio 通信,通过 http://127.0.0.1:5410 访问 managed-process relay。

远程部署(可选)

如需将 Box Runtime 部署到独立主机,可在 config.yaml 中配置 runtime_url

box:
  runtime_url: 'http://remote-box-host:5410'

此时 LangBot 通过 WebSocket 连接远程 Runtime,不再启动本地子进程。


安全模型

  1. 禁止挂载路径/etc/proc/sys/dev/root/boot、容器运行时 socket 等路径被硬编码阻断。Windows 环境额外阻断 C:\WindowsC:\Program Files 等系统路径。
  2. 允许挂载根目录白名单:只有 allowed_host_mount_roots 下的路径才允许挂载到 /workspace
  3. Profile 锁定:安全关键字段可由管理员锁定,模型侧无法覆盖。
  4. 资源限制:CPU、内存、PID 限制由 Backend 在容器 / nsjail 层实际执行。
  5. 只读根文件系统:容器 Backend 默认开启;nsjail Backend 也固定以只读系统挂载为核心模型。
  6. 输出截断:原始 stdout/stderr 各自有 1MB 上限,避免高吞吐命令把内存打满。
  7. Session TTL:空闲 Session 默认 300 秒自动回收,但有运行中 managed process 时不会被回收。
  8. 孤儿清理:容器 Backend 启动时会清理前一实例残留的 langbot.box=true 容器。
  9. Windows 支持:通过 Docker Desktop 支持 Windows 平台(仅 Docker 后端;Podman 和 nsjail 仅限 Linux)。

Skill / 插件如何接入

1. 通过 sandbox_exec

最简单的接入方式仍然是把 sandbox_exec 放进模型工具列表,让模型在需要时自行调用。

2. 直接调用 BoxService

适合插件、Skill 或平台内部逻辑明确需要执行固定命令的场景:

result = await ap.box_service.execute_sandbox_tool(
    parameters={'cmd': 'python3 -c "print(42)"', 'timeout_sec': 10},
    query=query,
)

3. MCP Server in Box

stdio MCP Server 在 Box 可用时自动运行在沙箱内,并支持通过 box 字段覆盖镜像、网络、挂载模式、启动超时等参数:

{
  "name": "my-mcp-server",
  "mode": "stdio",
  "command": "python",
  "args": ["server.py"],
  "box": {
    "image": "node:20",
    "network": "on",
    "host_path_mode": "ro",
    "startup_timeout_sec": 180
  }
}

文件结构

LangBot 主仓

src/langbot/pkg/box/
├── __init__.py
├── connector.py        # BoxRuntimeConnector,选择 stdio / ws 连接
└── service.py          # BoxService,Profile / 安全策略 / 对外 API

src/langbot/pkg/provider/tools/loaders/
├── native.py           # sandbox_exec 工具定义
└── mcp.py              # MCP Box-stdio 集成

src/langbot/pkg/api/http/controller/groups/
└── box.py              # /api/v1/box/status /sessions /errors

langbot-plugin-sdk

src/langbot_plugin/box/
├── __init__.py
├── __main__.py
├── actions.py          # Box Action RPC 动作枚举
├── backend.py          # BaseSandboxBackend + Docker / Podman Backend
├── client.py           # BoxRuntimeClient / ActionRPCBoxClient
├── errors.py           # Box 错误类型
├── models.py           # BoxSpec / BoxProfile / BoxSessionInfo 等
├── nsjail_backend.py   # nsjail Backend
├── runtime.py          # BoxRuntime,Session / managed process 生命周期
├── security.py         # 宿主机路径与安全校验
└── server.py           # 独立 Box Server + WebSocket Relay

部署与测试

LangBot/docker/docker-compose.yaml                       # 容器编排(Box Runtime 内嵌于 LangBot 容器)
LangBot/src/langbot/templates/config.yaml               # box 配置段

LangBot/tests/unit_tests/box/                           # BoxService / Connector 单测
LangBot/tests/unit_tests/provider/test_mcp_box_integration.py
LangBot/tests/integration_tests/box/                    # 端到端集成测试
langbot-plugin-sdk/tests/box/test_nsjail_backend.py     # nsjail Backend 单测

测试覆盖

  • LangBot 单测:覆盖 BoxServiceBoxRuntimeConnectorsandbox_exec 接入、MCP Box 配置与路径改写等逻辑。
  • LangBot 集成测试:覆盖端到端执行、Session 持久化、超时、网络隔离、managed process 生命周期、MCP Server in Box。
  • SDK 单测:覆盖 nsjail Backend 的探测、执行、Session 清理与隔离行为。

Q&A

Q: Profile 是全局的吗?模型能覆盖哪些参数?

是全局配置,来源于 config.yamlbox.profile。未锁定字段可被模型覆盖;锁定字段始终回退到 Profile 值。

Q: MCP Server 为什么不走 Profile?

因为 MCP Server 是管理员显式配置的可信进程,需求和 LLM 生成代码不同。它默认需要更高可用性,比如联网安装依赖,所以走 MCPServerBoxConfig 独立配置。

Q: Session TTL 会不会把 MCP Server 提前清掉?

不会。只要 Session 上还有运行中的 managed process,TTL 回收逻辑就会跳过它。

Q: 现在没有 Docker / Podman 怎么办?

Runtime 会按 Podman -> Docker -> nsjail 的顺序探测可用 Backend。三者都没有时,BoxService.available = Falsesandbox_exec 不会暴露给模型,stdio MCP 也会回退到宿主机直接运行。

Q: nsjail 现在是什么状态?

已经接入当前代码路径,不再只是规划。它是 BoxRuntime 的正式候选 Backend 之一,只是在实际部署中是否命中它,取决于宿主机上是否安装并可用。

Q: 如何接入新的 Backend?

实现 BaseSandboxBackend 接口并加入 BoxRuntime.backends 探测列表即可。LangBot 集成层、Action RPC 协议、工具定义都不需要改。

Q: 为什么 Box Runtime 不需要独立容器?

Box Runtime 进程本身只是一个纯调度进程:通过 docker socket 或 nsjail 命令创建和管理沙箱,不执行任何用户代码,也不直接操作文件系统。与 Plugin Runtime 不同(插件会直接操作文件系统、安装依赖、运行第三方代码),Box Runtime 没有隔离需求,作为子进程运行在 LangBot 容器内更简单,也避免了跨容器的路径映射和网络跳转。

Q: Windows 支持情况?

Windows 平台仅支持 Docker 后端(通过 Docker Desktop)。Podman 和 nsjail 依赖 Linux 内核特性(namespace、cgroups 等),仅限 Linux 环境使用。

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. eh: Feature enhance: 新功能添加 / add new features m: Tools 工具(ToolUse、内容函数)相关 / Function Calling or tools management pd: Need testing pending: 待测试的PR / PR waiting to be tested labels Mar 22, 2026
Comment thread web/src/router.tsx Fixed
Comment thread src/langbot/pkg/provider/runners/localagent.py Fixed
Comment thread src/langbot/pkg/provider/runners/localagent.py Fixed
Comment thread src/langbot/pkg/utils/paths.py Fixed
Comment thread src/langbot/pkg/utils/paths.py Fixed
Comment thread src/langbot/pkg/api/http/service/skill.py Fixed
Comment thread src/langbot/pkg/provider/tools/loaders/mcp.py Fixed
Comment thread src/langbot/pkg/provider/tools/loaders/skill.py Fixed
@RockChinQ RockChinQ force-pushed the feat/sandbox branch 2 times, most recently from 0d18fae to fa74c75 Compare April 19, 2026 12:26
Comment thread src/langbot/pkg/api/http/service/skill.py Fixed
Comment thread src/langbot/pkg/provider/runners/localagent.py Fixed
Comment thread src/langbot/pkg/provider/runners/localagent.py Fixed
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels May 4, 2026
…uncation

  - Implement head+tail output truncation (60/40 split) so LLM sees both
    beginning and final results; add streaming byte-limited reads in backend
    to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
  - Define BoxProfile model with locked fields and max_timeout_sec clamping
  - Add four built-in profiles: default, offline_readonly, network_basic,
    network_extended with differentiated resource and security constraints
  - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
    read_only_rootfs) and pass corresponding container CLI flags
    (--cpus, --memory, --pids-limit, --read-only, --tmpfs)
  - Profile loaded from config (box.profile), applied in service layer
    before BoxSpec validation; locked fields cannot be overridden by
    tool-call parameters
…kill cache

The Box backends behave inconsistently when extra_mounts reference a
missing host directory (nsjail aborts the entire sandbox start, Docker
silently creates a root-owned empty dir on the host, E2B silently skips
the upload). The cache in skill_mgr.skills is only refreshed on
in-process mutations, so out-of-band changes — container rebuilds,
manual rm in the box volume, anything the LangBot API didn't drive —
leave a stale skill that later produces one of those bad mount paths.

- box/service.py: build_skill_extra_mounts now filters skills whose
  package_root is not isdir on the LangBot-visible filesystem and logs
  a warning, instead of passing the bad mount through to the backend
- skill/manager.py: reload_skills (Box path) drops skills whose
  package_root is missing on the LangBot-side filesystem before they
  reach the in-memory cache, with a summary warning
- api/http/controller/groups/skills.py: file/CRUD handlers now also
  catch BoxError (RuntimeError subclass, previously slipping past
  ``except ValueError`` and surfacing as 500); list/get handlers gain
  a try/except so a transient Box RPC failure becomes a clean 400
  instead of a stack trace

Tests added for build_skill_extra_mounts (skip missing, skip empty,
no skill manager) and SkillManager.reload_skills (drop missing on Box
path). Full unit suite: 279 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RockChinQ and others added 3 commits May 20, 2026 17:07
Make the Box sandbox runtime optional. When ``box.enabled`` is false in
config (or when an enabled Box fails to connect), every dependent feature
degrades to the same disabled-state UX rather than crashing or silently
falling back to less safe code paths.

Backend:

- config.yaml: new top-level ``box.enabled: true`` flag (default true)
- BoxService:
  - Read box.enabled on construction
  - initialize() short-circuits when disabled — no remote WS connect, no
    stdio subprocess fork
  - _on_runtime_disconnect is a no-op when disabled (no reconnect loop
    on a deliberately-off service)
  - get_status() now exposes ``enabled`` so the frontend can tell
    "disabled in config" from "configured but failed"
- MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to
  be available, not just installed
- MCP _init_stdio_python_server: when ap.box_service exists but is
  unavailable, refuse the stdio server with an actionable error instead
  of silently falling through to host-stdio (which bypasses the sandbox
  the operator asked for). Setups without ap.box_service installed at
  all keep the legacy host-stdio fallback for pre-Box dev mode
- SkillService._require_box_for_write: refuses create/update/install/
  write_skill_file when ap.box_service is installed but unavailable.
  Distinguishes disabled vs failed in the error message so the UI can
  surface the right hint. Legacy setups (no ap.box_service) keep the
  local fallback path — that distinction is what keeps the existing
  local-skills tests valid

Tests:
- Box disabled-state behavior (4 cases)
- Skill write refusal in disabled & failed states (7 cases)
- MCP stdio runtime info policy updated to match new refuse-when-down
  behavior

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When Box is disabled in config (``box.enabled = false``) or fails to
connect, every dependent UI surface now degrades visibly:

- ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``,
  ``disabled`` (config-off) and a single ``hint`` key so callers don't
  have to re-derive the three states
- ``BoxUnavailableNotice`` reusable Alert banner driven by that hint
- Dashboard SystemStatusCards: three-state dot + label
  (connected / disabled-gray / disconnected-red); disabled state shows
  the ``boxDisabled`` hint, failed state continues to show the connector
  error. Plugin block kept untouched
- Skills page (create view) and SkillDetailContent (edit view):
  Save button disabled and banner inserted above the form when Box is
  unavailable — matches the backend gate added in the previous commit
- PipelineExtension skill section: ``enable_all_skills`` switch, Add
  Skill button and Remove buttons all gate on Box availability;
  banner inline under the section header
- PipelineFormComponent: banner above the ``local-agent`` stage card
  when Box is unavailable, since that stage carries the sandbox-bound
  ``box-session-id-template`` field
- Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale
  files updated with ``boxDisabled`` / ``boxUnavailable`` /
  ``boxRequiredHint`` strings

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docker-compose: move ``langbot_box`` under compose profiles
  (``box`` and ``all``) so ``docker compose up`` no longer requires
  the sandbox container. Inline comment explains how to pair the
  profile choice with ``box.enabled`` so the langbot service does not
  thrash trying to reach a runtime that was never started
- docs/review/box-architecture.md:
  - Annotate ``box.enabled`` in the config.yaml example, listing the
    exact side effects (no remote/stdio connect; tools/skills/MCP
    stdio off; reads still work)
  - Replace the bare compose snippet with the actual profile-driven
    invocation and the BOX__ENABLED pairing
  - New "关闭/连接失败时的行为矩阵" section: a single table mapping
    every consumer (native tools, activate/register_skill, stdio MCP,
    skill list/CRUD, pipeline AI config, extensions page, dashboard)
    to its disabled-state behavior, plus the legacy ``ap.box_service``
    distinguisher note

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/langbot/pkg/api/http/service/skill.py Fixed
RockChinQ and others added 5 commits May 20, 2026 17:42
… tooltip

The previous commit hard-coded a BoxUnavailableNotice banner above the
``local-agent`` stage card. That works, but it shouts at the user about
every field in that stage when in reality only one field —
``box-session-id-template`` — depends on the sandbox.

Use the dynamic-form schema's existing variable-injection mechanism
(``__system.*`` references via ``systemContext``) and add a sibling to
``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays
visible, becomes inert, and an info icon next to its label exposes the
reason on hover. The rest of the AI tab is left untouched.

- entities/form/dynamic.ts: extend IDynamicFormItemSchema with
  ``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject``
- DynamicFormComponent: evaluate disable_if with the same resolver as
  show_if; OR the result into isFieldDisabled; render an Info tooltip
  trigger next to the label when the condition matches
- ai.yaml metadata: attach disable_if (__system.box_available eq false)
  and a localized disabled_tooltip to box-session-id-template
- PipelineFormComponent: drop the BoxUnavailableNotice import and the
  per-stage banner; pass ``systemContext={ box_available: boxAvailable }``
  only for the local-agent stage so other stages aren't paying the
  re-render cost

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the MCP detail dialog dumped the raw RuntimeError text from
``_init_stdio_python_server`` — English-only, prefixed with "Failed
after 4 attempts", and exposing internal config names. The retry
wrapper also kept retrying a refusal that is deterministically going
to fail again, polluting logs.

Replace the raw text with a structured signal:

- New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio
  refusal path sets it before raising and uses a short opaque
  discriminator (``box_disabled_in_config`` / ``box_unavailable``) as
  the message body — never user-facing
- ``_lifecycle_loop_with_retry`` short-circuits on
  ``BOX_UNAVAILABLE``: surfaces the error immediately, no retries,
  no "Failed after N attempts" prefix. Silences the warning storm
  seen during smoke-testing
- ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``,
  ``retry_count``, ``box_session_id``, ``box_enabled`` to match what
  the backend already returns in get_runtime_info_dict()
- Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and
  ``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect
  ``error_phase === 'box_unavailable'`` and render a two-line
  localized notice: state line ("Box disabled / unreachable") plus
  remediation line ("enable Box or switch to http/sse")
- 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get
  ``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``,
  ``mcp.boxStdioRefusedSuggestion``

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ilable

When Box is disabled in config (``box.enabled = false``) or unreachable,
saving a new MCP server in stdio mode produced one that could never
start — the user would only learn that from the runtime error on the
detail page. Stop the user before they save instead.

Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog
``MCPFormDialog.tsx``) now:

- Disable the ``stdio`` option in the mode select when Box is
  unavailable, with a small "(requires Box)" suffix so the reason is
  obvious. Existing stdio configs still display their current value
- Show ``BoxUnavailableNotice`` inline under the mode select when the
  currently-selected mode is stdio and Box is unavailable, so editing
  a stale stdio config makes the cause visible
- Disable the Save / Submit button while stdio is selected under that
  condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop
  so the parent ``MCPDetailContent`` can disable both its Submit and
  Save buttons. ``MCPFormDialog`` disables its Save button locally
- Refuse the submit handler too (Enter-key path) with a toast carrying
  the same i18n message

i18n: ``mcp.boxRequired`` (short tag in the disabled option) and
``mcp.stdioBlockedByBoxToast`` added to all 8 locales.

Backend runtime gate (``_init_stdio_python_server`` refusal +
``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place
as the last line of defence for API bypass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le source

Skills now flow exclusively through the Box runtime. Every read and write
method funnels through ``_box_service()``; when Box is unavailable
(disabled in config, connection failed, or simply not installed) the
operation either returns an empty surface (``list_skills`` → []) or
raises with a clear ``Box runtime ... not initialised / disabled /
unavailable: ...`` message via the new ``_require_box(action)`` helper.

Why: the legacy local-fallback path scanned ``data/skills/``, but Box
manages its own ``box.local.skills_root`` (default ``data/box/skills/``).
The two diverging directories caused stale / phantom skill lists when
Box flapped, and the local-fallback writes silently bypassed all the
sandboxing the operator had configured.

SkillService (``api/http/service/skill.py``):
- New ``_require_box(action)`` returns the box service or raises a
  structured ValueError. ``_require_box_for_write`` kept as alias
- ``list_skills`` → returns [] when Box is down so the UI can render
  the disabled banner cleanly
- ``get_skill`` / ``get_skill_by_name`` → return None
- All read-file / write-file / scan-dir / create / update / delete /
  install / preview methods → ``_require_box`` then box delegate.
  Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview
  pipelines) removed entirely

SkillManager (``pkg/skill/manager.py``):
- ``reload_skills`` returns early with empty cache when Box is down.
  data/skills/ discovery loop removed
- ``refresh_skill_from_disk`` now just reports cache presence; the
  on-disk re-parse is gone since Box is the only writer

Tests:
- Drop 11 obsolete test_skill_service.py tests that exercised the
  removed local-fallback paths (create/install/file/delete/update)
- Add list-empty + read-refused tests; flip the legacy-allow test to
  legacy-refuses-too
- Rewrite refresh_skill_from_disk test to match the new behaviour

Several helper methods (_managed_skill_path, _resolve_skill_path,
_preview_skill_candidates, _install_preview_candidates, etc.) are now
unreachable; a follow-up commit will prune them so this diff stays
reviewable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/langbot/pkg/skill/manager.py Fixed
Comment thread src/langbot/pkg/api/http/service/skill.py Fixed
…migration

Follow-up to the Box-only refactor. The previous commit removed the
local-fallback BRANCHES from every public method; this one removes the
HELPERS those branches called, which are now unreachable.

SkillService (service/skill.py): 787 → 449 lines
  Removed: scan_directory (sync), _read_skill_package, _write_skill_md,
  _resolve_create_field, _managed_skill_path,
  _managed_install_root_for_package, _normalize_package_root,
  _resolve_skill_path, _find_skill_entry, _discover_skill_directories,
  _safe_extract_zip, _extract_uploaded_skill_to_temp,
  _download_github_skill_to_temp, _resolve_github_source_root,
  _build_preview_target_dir, _preview_skill_candidates,
  _select_preview_candidates, _install_preview_candidates,
  _preview_source_root, _resolve_installed_skills, plus the
  module-level _FRONTMATTER_FIELDS and _build_skill_md.
  Kept (still needed by the surviving GitHub-import path):
  _download_github_asset, _download_github_skill_directory_as_zip,
  _find_github_skill_archive_entry, _copy_github_skill_directory_to_zip,
  _is_github_skill_md_url, _parse_github_skill_md_url,
  _resolve_github_skill_md_package_name, _validate_github_asset_url,
  _uploaded_skill_target_stem, _validate_skill_name.
  Imports dropped: shutil, tempfile, yaml, ....utils.paths.

SkillManager (skill/manager.py): 187 → 88 lines
  Removed: get_managed_skills_root, _discover_skill_directories,
  _find_skill_entry, _load_skill_file, _normalize_package_root.
  Imports dropped: datetime, parse_frontmatter, paths.

Tests:
  - test_skill_service.py: drop the 3 sync scan_directory tests +
    skill_service fixture + _create_skill_file helper
  - test_skill_tools.py: drop test_load_skill_file_success; rename
    TestSkillManagerPackageLoading → TestSkillManagerCache

Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
return
result = reload_skills()
if inspect.isawaitable(result):
await result
RockChinQ and others added 5 commits May 20, 2026 22:37
The contributor's original PR (#1917) appended an ``Available Skills``
index to the system prompt before the LLM saw the user message, so the
LLM could decide whether to activate a skill. ``7145447b`` removed the
text-marker activation flow and, together with it, the entire system
prompt injection — but the Tool Call replacement only put the available
skills inside the ``activate`` tool's description. In practice the LLM
ignores tool descriptions for selection and goes straight to native
tools, so user-visible skill activation silently broke.

Restore the injection, adapted for the Tool Call era:

- SkillManager regains ``get_skill_index(bound_skills)`` and
  ``build_skill_aware_prompt_addition(bound_skills)``. The addendum
  carries only ``name (display_name): description`` for each
  pipeline-visible skill plus one instruction line pointing at the
  ``activate`` tool. No SKILL.md contents — KV cache stays clean
- PreProcessor appends the addendum to the first system message (or
  inserts a new one) of ``query.prompt.messages`` for the local-agent
  runner. Handles plain-string and ContentElement[] bodies. Skips
  cleanly when no skills are visible
- 3 new test_preproc cases: injection happens, bound-skills subset
  honoured, empty addendum touches nothing. 280 passed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Until now ``BoxService.get_status`` returned ``available: true`` whenever
the runtime connector was healthy, even if the runtime itself reported
``backend: { available: false }`` (operator selected nsjail without the
binary, Docker daemon crashed mid-session, E2B credentials wrong, ...).
The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the
top-level flag and showed "connected" while every actual call to native
exec or skill management would fail.

The native-tool loader already polled ``status.backend.available``
independently and hid its tools correctly, but every other consumer
(dashboard banner, the disabled-state hint, the LLM-facing message)
disagreed with it.

Combine the two in the payload: ``available = self._available AND
status.backend.available``. When ``backend.available`` is false we now
also surface a ``connector_error`` that names the backend
("Configured sandbox backend \"nsjail\" is unavailable") so the dialog
shows the actionable reason instead of an empty error pane. The
detailed ``backend`` object is preserved unchanged for the dialog.

Internal ``box_service.available`` (used by ``skill_service`` writes,
``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally
NOT changed — it still tracks connector health only, so a backend blip
does not trigger spurious reconnect loops.

Tests:
- ``test_get_status_downgrades_available_when_backend_dead`` — exercise
  the new branch (connector OK, backend.available=false → top-level
  available=false, connector_error mentions the backend name)
- ``test_get_status_keeps_available_true_when_backend_ok`` — guard
  against regressing the happy path

Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary):
``GET /api/v1/box/status`` now returns ``available: false`` with the
named connector_error, instead of the previous misleading
``available: true``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When Box is configured but the runtime reports its backend is dead
(e.g. ``box.backend = nsjail`` but the binary is missing, or Docker
daemon crashed), the backend now returns a structured
``connector_error`` like ``Configured sandbox backend "nsjail" is
unavailable``. The previous notice only said "Box sandbox is
unavailable" + a generic "enable Box" hint, hiding the actionable
detail.

- ``useBoxStatus``: derive ``reason`` from ``status.connector_error``.
  Only exposed for the failed-state (``hint === 'boxUnavailable'``),
  since the disabled-by-config message already carries its reason
- ``BoxUnavailableNotice``: insert the reason as a small monospaced
  line between the state message and the action hint. The disabled
  variant is unchanged (operator chose the state)
- Wire ``reason`` through every existing call site (Skills page +
  detail, PipelineExtension, both MCP forms). Old unused ``context``
  prop dropped

Net layout (3 lines, still compact):

  ⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ...
    Configured sandbox backend "nsjail" is unavailable
    This feature requires the Box runtime. Enable it in config ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflicts in:
- .github/workflows/run-tests.yml: keep master's src/langbot/** paths plus feat/** push branch
- src/langbot/pkg/plugin/connector.py: keep both branches' marketplace MCP/skill
  install logic (HEAD) and runtime/wait helpers (master); add missing return in
  _inspect_plugin_package so LOCAL/GITHUB install paths get author/name back
- tests/unit_tests/pipeline/test_n8nsvapi.py: keep HEAD's try/finally sys.modules
  save/restore pattern
- web/src/app/home/components/dynamic-form/DynamicFormComponent.tsx: union
  imports + keep HEAD's disable_if/tooltip support and master's QrCodeLoginDialog
- web/src/i18n/locales/*: union of disjoint top-level keys from both branches
- web/src/app/home/market/page.tsx: accept our deletion (unified extensions page)
- uv.lock: regenerate via uv sync --dev
The merge from master brought in new unit tests that target pre-refactor
APIs on feat/sandbox. Reconcile each:

- factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills
  dict + inert prompt-addition builder) and a Mock pipeline_service so the
  PreProcessor skill-index injection branch can run end-to-end in tests.

- pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so
  pipeline.stage is fully initialised before any individual stage test
  (preproc, longtext, ...) tries to lazy-load it. Without this preload,
  running test_preproc.py in isolation hit a circular-import error via the
  stage -> app -> pipelinemgr -> stage chain.

- provider/test_tool_manager.py: ToolManager now probes four loaders
  (native -> plugin -> mcp -> skill). Inject inert native + skill mocks in
  the execute_func_call fixture and assert all four shutdowns fire.

- utils/test_paths.py: drop the three cwd-dependent _check_if_source_install
  cases. The refactor walks Path(__file__).resolve().parents looking for
  pyproject.toml + main.py, so cwd no longer factors in and there's no
  file read to mock-fail. The positive case and caching test still apply.

- utils/test_version.py: delete entirely. is_newer and compare_version_str
  were removed when VersionManager was refactored to use the Space API for
  release checks (1b4107a); the tests targeted a surface that no longer
  exists.
# any individual stage test (e.g. preproc, longtext) tries to import it. Without
# this, running a stage test in isolation triggers a circular-import error:
# stage.py → core.app → pipelinemgr → stage.stage_class (not yet bound).
import langbot.pkg.pipeline.pipelinemgr # noqa: F401
RockChinQ added 4 commits May 21, 2026 13:21
Mirror the plugin runtime: box is now started through the same CLI entry
point (langbot_plugin.cli) instead of the box module directly.

- docker-compose.yaml: langbot_box command runs `langbot_plugin.cli ... box`
  (WebSocket is the default transport, no flag needed — matches `rt`).
- box/connector.py: both subprocess launch sites (_start_local_stdio and
  the Windows _start_subprocess_then_ws path) invoke
  `langbot_plugin.cli.__init__ box`, using `-s` for the stdio transport.
- docs/review: update stale `-m langbot_plugin.box[.server]` references.

Pairs with the SDK change that removes box's direct-launch entry points
(python -m langbot_plugin.box / .box.server) and the legacy --mode flag.
CI on feat/sandbox failed across Unit Tests, Lint and Build Dev Image.
Root causes and fixes:

- pyproject.toml had a [tool.uv.sources] editable override pinning
  langbot-plugin to ../langbot-plugin-sdk. That path only exists in a
  paired local checkout, so `uv sync` failed on every CI runner
  ("Distribution not found"). Remove the override and regenerate uv.lock
  so langbot-plugin==0.4.0b1 resolves from PyPI, matching master.

- tests/integration/api/test_pipelines.py: the pipeline extensions
  endpoint now calls ap.skill_service.list_skills(); add the missing
  skill_service mock to the fake_pipeline_app fixture (the test came
  from master, the endpoint change from feat/sandbox).

- Apply ruff format to three src files and prettier to three web files
  that had committed formatting drift, failing `ruff format --check`
  and `pnpm lint`.
…_bot

The dashboard pipeline-debug WebSocket
(/api/v1/pipelines/<uuid>/ws/connect) and the embed widget WebSocket
(/api/v1/embed/<bot_uuid>/ws/connect) already live on separate paths,
but the debug handler ran `_find_owner_bot(pipeline_uuid)` and, when
the same pipeline happened to be bound to a web_page_bot, passed that
bot as `owner_bot` into `handle_websocket_message`. The adapter then
used the page bot's listeners + adapter for the request, so debug
sessions were logged as "page bot" activity in the dashboard.

Debug sessions must always run under the built-in websocket_proxy_bot.
Remove `_find_owner_bot`, drop the `owner_bot` parameter from the
debug-path `_handle_receive`, and call `handle_websocket_message`
without it so the adapter takes its default proxy-bot branch. The
embed handler still resolves and passes its `runtime_bot` for the
page-bot path, so attribution there is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eh: Feature enhance: 新功能添加 / add new features m: Tools 工具(ToolUse、内容函数)相关 / Function Calling or tools management pd: Need testing pending: 待测试的PR / PR waiting to be tested size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants