Two related follow-ups from #37, with three temporarily-skipped tests to re-enable.
A. Investigate: large host-tool message wedge under CI load (possible real bug)
tests/runtime_contracts/test_tool_contract.py::test_large_host_tool_request_round_trips (a 950KB × 12 stress test, originally written for the removed sbx exec transport) intermittently times out at 10s on CI for both:
internal/python-runner-jsonrpc (deprecated SbxBackend stdin/stdout — dead path), and
python-runner/direct-process (DirectPythonBackend, shared base pipe transport — production).
It passes instantly (~0.3s) everywhere locally; only reproduces under CI load.
Open question: is this a genuine large-message deadlock in SupervisorClient pipe I/O (the reader-thread drain racing the inline stdout read on a message larger than the ~64KB pipe buffer), or just CI slowness? If genuine, it's a real bug in DirectPythonBackend, not dead code.
Currently marked @pytest.mark.local (runs locally, skipped in CI).
B. Retire the SbxBackend stdin/stdout transport (tech debt)
SbxBackend's stdin/stdout transport (_supervisor_command/_runner_command, _start_stdout_reader, _read_stdout_line, and the non-websocket branches) is test-only — production always uses websocket (_uses_websocket_transport() returns self._supervisor_command is None, and no production caller sets _supervisor_command). Remove it and:
- Migrate the SbxBackend-specific coverage that lives only in
TestSbxBackendLocalRunner — verbose output, debug logging, host-tool synced-file writeback, staging-root lifecycle (~8–12 tests) — to TestSbxBackendLocalWebSocketRunner. (Generic execute/tools/timeouts/errors/submit/files behaviors are already covered by the runtime-contracts matrix on direct-process/jspi, so they don't need re-homing.)
- Rehome the custom-supervisor-command recovery tests (dead-runner restart, silent-runner timeout) onto websocket equivalents — these don't map mechanically since websocket has its own recovery path.
- Drop the
internal/python-runner-jsonrpc runtime seam from tests/runtime_contracts/backends.py.
- Sweep
_supervisor_command usages in tests/test_response_id_resync.py and tests/test_workspace.py.
Coverage note (why deletion isn't lossless)
TestSbxBackendLocalWebSocketRunner (15 tests) and TestSbxBackendLocalRunner (29 tests) have zero name overlap; 9 of the websocket runner's tests are predict() reconstruction tests, leaving only ~6 transport tests. So the websocket runner is not a superset — the SbxBackend-specific behaviors above must be migrated, not dropped.
Re-enable when addressed
Two related follow-ups from #37, with three temporarily-skipped tests to re-enable.
A. Investigate: large host-tool message wedge under CI load (possible real bug)
tests/runtime_contracts/test_tool_contract.py::test_large_host_tool_request_round_trips(a 950KB × 12 stress test, originally written for the removedsbx exectransport) intermittently times out at 10s on CI for both:internal/python-runner-jsonrpc(deprecated SbxBackend stdin/stdout — dead path), andpython-runner/direct-process(DirectPythonBackend, shared base pipe transport — production).It passes instantly (~0.3s) everywhere locally; only reproduces under CI load.
Open question: is this a genuine large-message deadlock in
SupervisorClientpipe I/O (the reader-thread drain racing the inlinestdoutread on a message larger than the ~64KB pipe buffer), or just CI slowness? If genuine, it's a real bug inDirectPythonBackend, not dead code.Currently marked
@pytest.mark.local(runs locally, skipped in CI).B. Retire the SbxBackend stdin/stdout transport (tech debt)
SbxBackend's stdin/stdout transport (_supervisor_command/_runner_command,_start_stdout_reader,_read_stdout_line, and the non-websocket branches) is test-only — production always uses websocket (_uses_websocket_transport()returnsself._supervisor_command is None, and no production caller sets_supervisor_command). Remove it and:TestSbxBackendLocalRunner— verbose output, debug logging, host-tool synced-file writeback, staging-root lifecycle (~8–12 tests) — toTestSbxBackendLocalWebSocketRunner. (Generic execute/tools/timeouts/errors/submit/files behaviors are already covered by the runtime-contracts matrix ondirect-process/jspi, so they don't need re-homing.)internal/python-runner-jsonrpcruntime seam fromtests/runtime_contracts/backends.py._supervisor_commandusages intests/test_response_id_resync.pyandtests/test_workspace.py.Coverage note (why deletion isn't lossless)
TestSbxBackendLocalWebSocketRunner(15 tests) andTestSbxBackendLocalRunner(29 tests) have zero name overlap; 9 of the websocket runner's tests arepredict()reconstruction tests, leaving only ~6 transport tests. So the websocket runner is not a superset — the SbxBackend-specific behaviors above must be migrated, not dropped.Re-enable when addressed
TestSbxBackendLocalRunner::test_verbose_prints_output_tool_calls_and_errorsTestPythonRunnerProtocol::test_stale_concurrent_tool_calls_do_not_poison_later_executetest_large_host_tool_request_round_trips(whole test)