feat(entrypoints): add control-plane HTTP server sidecar (--control-port)#305
feat(entrypoints): add control-plane HTTP server sidecar (--control-port)#305qywu wants to merge 17 commits into
Conversation
Adds tokenspeed.runtime.entrypoints.http_server — a FastAPI/uvicorn server that wraps Engine directly, bypassing the smg+gRPC stack. Useful for: - RL training: /pause_generation, /continue_generation (PR lightseekorg#270), /init_weights_update_group, /update_weights_from_distributed, /release_memory_occupation, /resume_memory_occupation - Benchmarking: direct HTTP access without smg overhead - Testing: simpler startup, /readiness probe, no smg dependency Endpoints: GET /health, /readiness, /get_server_info, /v1/models POST /generate, /v1/completions, /v1/chat/completions (streaming supported) POST /flush_cache, /start_profile, /stop_profile POST /pause_generation, /continue_generation (requires PR lightseekorg#270; returns 501 until merged) POST /init_weights_update_group, /update_weights_from_distributed POST /release_memory_occupation, /resume_memory_occupation CLI: `tokenspeed http-server --host 0.0.0.0 --port 8080 --model <path> ...` (engine ServerArgs passed through after --host/--port) Standalone: `tokenspeed-http-server --model <path> ...` Python API: `from tokenspeed.runtime.entrypoints.http_server import run` Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Adds a lightweight control-plane HTTP server that runs alongside the smg gateway on a separate port, proxying engine control calls to smg. Architecture: Client (generation) ──► smg gateway :8080 ──► gRPC engine Client (control) ──► http_server :8081 ──► smg gateway :8080 Changes: - python/tokenspeed/runtime/entrypoints/http_server.py: FastAPI server that proxies /pause_generation, /continue_generation (PR lightseekorg#270), weight-update, /flush_cache, /start_profile, /stop_profile, /get_server_info, /health, /readiness to the smg gateway. - python/tokenspeed/cli/_argsplit.py: adds --control-port to _ORCH_FLAGS and OrchestratorOpts.control_port (int | None). - python/tokenspeed/cli/serve_smg.py: after smg is ready, starts the control server in a daemon thread when --control-port is set. - test/runtime/test_http_server.py: unit tests for the control server endpoints and --control-port arg parsing. Usage: tokenspeed serve --model <path> --port 8080 --control-port 8081 Signed-off-by: Qingyang Wu <willqywu@gmail.com>
…equired Remove the requirement to pass --control-port explicitly. The control HTTP server now starts automatically as a sidecar on user_port+1 whenever tokenspeed serve runs. --control-port remains available as an override for cases where port+1 is taken. Signed-off-by: Qingyang Wu <willqywu@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 584c4dea0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| @app.post("/release_memory_occupation") | ||
| async def release_memory_occupation(request: Request): | ||
| return await _proxy("POST", "/release_memory_occupation", await request.json()) |
There was a problem hiding this comment.
Accept empty bodies for no-argument control calls
For POST /release_memory_occupation (and the analogous resume/pause default case), a client that sends the usual no-body control request hits await request.json() before the proxy call, so an empty body raises a JSON decode error instead of reaching the gateway. The runtime API for release_memory_occupation has no required input fields, so this sidecar should treat an empty body as {}/None rather than returning a 500 for that valid call pattern.
Useful? React with 👍 / 👎.
| _start_control_server( | ||
| gateway_url=f"http://{user_host}:{user_port}", | ||
| host=user_host, | ||
| port=control_port, | ||
| ) | ||
| sys.stdout.write( | ||
| f"ts control server ready on http://{user_host}:{control_port}\n" | ||
| ) |
There was a problem hiding this comment.
Verify the control server starts before declaring readiness
When the requested/default control port is already in use (or uvicorn fails to bind for any other reason), _start_control_server only launches a daemon thread and returns immediately, so this code still prints ts control server ready and the orchestrator continues with no working control plane. This affects deployments that pass --control-port and rely on this readiness line for automation; the startup path should synchronize with uvicorn binding or surface the failure before advertising the sidecar as ready.
Useful? React with 👍 / 👎.
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
…readiness only Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
…le stubs Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
…hCheck, Abort Signed-off-by: Qingyang Wu <willqywu@gmail.com>
…ouble-encoding Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Review fixes for lightseekorg#305: - Reuse a single shared gRPC channel/stub instead of creating (and leaking) a new channel on every /get_server_info, /get_model_info, /health_check, /abort call. - Map grpc.aio.AioRpcError to a clean 503 JSON response instead of an unhandled 500 + stack trace. - Fix stale "Health (local)" comment (now proxied). Signed-off-by: Qingyang Wu <willqywu@gmail.com>
The `async with ClientSession()` block closed the session as soon as _proxy_request returned the StreamingResponse — but FastAPI consumes the body iterator afterward, so the upstream connection was closed mid-stream and streaming requests raised "Connection closed." (caught only against a real engine; a fast mock fully buffers before close and hides it). Now the session is created without a context manager and closed in the generator's finally (streaming) or after read() (non-streaming). Verified against a live `tokenspeed serve` (Qwen2.5-0.5B): SSE tokens stream incrementally through the sidecar with a proper data: [DONE] terminator; non-streaming, /health, and gRPC-direct endpoints all 200. Signed-off-by: Qingyang Wu <willqywu@gmail.com>
|
Superseded by #308 — same change squashed into a single commit, plus comprehensive regression tests (test/runtime/test_http_server.py) for the streaming session-lifetime, double-encoding, gRPC channel-reuse, and gRPC-error-mapping bugs found during development. |
Address review feedback on lightseekorg#305/lightseekorg#308: the orchestrator printed "ts control server ready" right after spawning the uvicorn thread, before the socket was bound. If the port was in use the thread died silently and automation waiting on that line would hit a dead endpoint. http_server.build_server() now returns an unstarted uvicorn.Server, and _start_control_server() polls server.started (uvicorn sets it only after the socket binds), returning False if the thread dies or times out. The ready line is gated on success; a bind failure prints a WARNING and serving continues (the smg gateway is independent). Tests cover both the ready-after-bind and port-in-use paths. Signed-off-by: Qingyang Wu <willqywu@gmail.com>
Summary
Adds an HTTP server sidecar that starts automatically alongside `tokenspeed serve` on `main_port + 1` (override with `--control-port`).
Architecture:
```
Client ──► http_server :8001
├─ /health, /get_server_info, /get_model_info,
│ /health_check, /abort ──► gRPC engine (direct)
└─ /generate, /v1/*, /flush_cache
──► smg gateway :8000 ──► gRPC engine
```
Endpoints
Usage
```bash
Auto-starts on port+1 (no flag needed)
tokenspeed serve --model --port 8000
→ smg gateway on :8000, HTTP server on :8001
Override control port
tokenspeed serve --model --port 8000 --control-port 9000
```
Changes
Notes