Skip to content

Retire SbxBackend stdin/stdout transport + investigate large-message pipe wedge #43

Description

@magix022

Two related follow-ups from #37, with three temporarily-skipped tests to re-enable.

A. Investigate: large host-tool message wedge under CI load (possible real bug)

tests/runtime_contracts/test_tool_contract.py::test_large_host_tool_request_round_trips (a 950KB × 12 stress test, originally written for the removed sbx exec transport) intermittently times out at 10s on CI for both:

  • internal/python-runner-jsonrpc (deprecated SbxBackend stdin/stdout — dead path), and
  • python-runner/direct-process (DirectPythonBackend, shared base pipe transport — production).

It passes instantly (~0.3s) everywhere locally; only reproduces under CI load.

Open question: is this a genuine large-message deadlock in SupervisorClient pipe I/O (the reader-thread drain racing the inline stdout read on a message larger than the ~64KB pipe buffer), or just CI slowness? If genuine, it's a real bug in DirectPythonBackend, not dead code.

Currently marked @pytest.mark.local (runs locally, skipped in CI).

B. Retire the SbxBackend stdin/stdout transport (tech debt)

SbxBackend's stdin/stdout transport (_supervisor_command/_runner_command, _start_stdout_reader, _read_stdout_line, and the non-websocket branches) is test-only — production always uses websocket (_uses_websocket_transport() returns self._supervisor_command is None, and no production caller sets _supervisor_command). Remove it and:

  • Migrate the SbxBackend-specific coverage that lives only in TestSbxBackendLocalRunner — verbose output, debug logging, host-tool synced-file writeback, staging-root lifecycle (~8–12 tests) — to TestSbxBackendLocalWebSocketRunner. (Generic execute/tools/timeouts/errors/submit/files behaviors are already covered by the runtime-contracts matrix on direct-process/jspi, so they don't need re-homing.)
  • Rehome the custom-supervisor-command recovery tests (dead-runner restart, silent-runner timeout) onto websocket equivalents — these don't map mechanically since websocket has its own recovery path.
  • Drop the internal/python-runner-jsonrpc runtime seam from tests/runtime_contracts/backends.py.
  • Sweep _supervisor_command usages in tests/test_response_id_resync.py and tests/test_workspace.py.

Coverage note (why deletion isn't lossless)

TestSbxBackendLocalWebSocketRunner (15 tests) and TestSbxBackendLocalRunner (29 tests) have zero name overlap; 9 of the websocket runner's tests are predict() reconstruction tests, leaving only ~6 transport tests. So the websocket runner is not a superset — the SbxBackend-specific behaviors above must be migrated, not dropped.

Re-enable when addressed

  • TestSbxBackendLocalRunner::test_verbose_prints_output_tool_calls_and_errors
  • TestPythonRunnerProtocol::test_stale_concurrent_tool_calls_do_not_poison_later_execute
  • test_large_host_tool_request_round_trips (whole test)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions