feat(entrypoints): add control-plane HTTP server sidecar (--control-port) by qywu · Pull Request #305 · lightseekorg/tokenspeed

qywu · 2026-05-28T22:49:31Z

Summary

Adds an HTTP server sidecar that starts automatically alongside `tokenspeed serve` on `main_port + 1` (override with `--control-port`).

Architecture:
```
Client ──► http_server :8001
├─ /health, /get_server_info, /get_model_info,
│ /health_check, /abort ──► gRPC engine (direct)
└─ /generate, /v1/*, /flush_cache
──► smg gateway :8000 ──► gRPC engine
```

Endpoints

Method	Path	Backend
GET	`/health`	local (always 200)
GET	`/get_server_info`	gRPC direct
GET	`/get_model_info`	gRPC direct
GET	`/health_check`	gRPC direct
POST	`/abort`	gRPC direct
GET/POST	`/generate`	smg passthrough
POST	`/v1/completions`	smg passthrough (streaming supported)
POST	`/v1/chat/completions`	smg passthrough (streaming supported)
GET	`/v1/models`	smg passthrough
POST	`/v1/messages`	smg passthrough (Anthropic API)
POST	`/v1/responses`	smg passthrough (OpenAI Responses API)
POST	`/flush_cache`	smg passthrough

Usage

```bash

Auto-starts on port+1 (no flag needed)

tokenspeed serve --model --port 8000

→ smg gateway on :8000, HTTP server on :8001

Override control port

tokenspeed serve --model --port 8000 --control-port 9000
```

Changes

`python/tokenspeed/runtime/entrypoints/http_server.py` — FastAPI server with gRPC direct + smg passthrough
`python/tokenspeed/cli/_argsplit.py` — `--control-port` orchestrator flag
`python/tokenspeed/cli/serve_smg.py` — auto-start sidecar after smg ready, passing both `gateway_url` and `engine_grpc_addr`
`test/runtime/test_http_server.py` — unit tests for `/health` and `--control-port` parsing

Notes

`start_profile` / `stop_profile` are not yet exposed (no gRPC method in the proto; sent over ZMQ internally)
`pause_generation` / `continue_generation` will be added once PR feat(engine): add pause_generation / continue_generation API #270 merges

Adds tokenspeed.runtime.entrypoints.http_server — a FastAPI/uvicorn server that wraps Engine directly, bypassing the smg+gRPC stack. Useful for: - RL training: /pause_generation, /continue_generation (PR lightseekorg#270), /init_weights_update_group, /update_weights_from_distributed, /release_memory_occupation, /resume_memory_occupation - Benchmarking: direct HTTP access without smg overhead - Testing: simpler startup, /readiness probe, no smg dependency Endpoints: GET /health, /readiness, /get_server_info, /v1/models POST /generate, /v1/completions, /v1/chat/completions (streaming supported) POST /flush_cache, /start_profile, /stop_profile POST /pause_generation, /continue_generation (requires PR lightseekorg#270; returns 501 until merged) POST /init_weights_update_group, /update_weights_from_distributed POST /release_memory_occupation, /resume_memory_occupation CLI: `tokenspeed http-server --host 0.0.0.0 --port 8080 --model <path> ...` (engine ServerArgs passed through after --host/--port) Standalone: `tokenspeed-http-server --model <path> ...` Python API: `from tokenspeed.runtime.entrypoints.http_server import run` Signed-off-by: Qingyang Wu <willqywu@gmail.com>

Adds a lightweight control-plane HTTP server that runs alongside the smg gateway on a separate port, proxying engine control calls to smg. Architecture: Client (generation) ──► smg gateway :8080 ──► gRPC engine Client (control) ──► http_server :8081 ──► smg gateway :8080 Changes: - python/tokenspeed/runtime/entrypoints/http_server.py: FastAPI server that proxies /pause_generation, /continue_generation (PR lightseekorg#270), weight-update, /flush_cache, /start_profile, /stop_profile, /get_server_info, /health, /readiness to the smg gateway. - python/tokenspeed/cli/_argsplit.py: adds --control-port to _ORCH_FLAGS and OrchestratorOpts.control_port (int | None). - python/tokenspeed/cli/serve_smg.py: after smg is ready, starts the control server in a daemon thread when --control-port is set. - test/runtime/test_http_server.py: unit tests for the control server endpoints and --control-port arg parsing. Usage: tokenspeed serve --model <path> --port 8080 --control-port 8081 Signed-off-by: Qingyang Wu <willqywu@gmail.com>

…equired Remove the requirement to pass --control-port explicitly. The control HTTP server now starts automatically as a sidecar on user_port+1 whenever tokenspeed serve runs. --control-port remains available as an override for cases where port+1 is taken. Signed-off-by: Qingyang Wu <willqywu@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 584c4dea0c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T00:09:54Z

+
+@app.post("/release_memory_occupation")
+async def release_memory_occupation(request: Request):
+    return await _proxy("POST", "/release_memory_occupation", await request.json())


Accept empty bodies for no-argument control calls

For POST /release_memory_occupation (and the analogous resume/pause default case), a client that sends the usual no-body control request hits await request.json() before the proxy call, so an empty body raises a JSON decode error instead of reaching the gateway. The runtime API for release_memory_occupation has no required input fields, so this sidecar should treat an empty body as {}/None rather than returning a 500 for that valid call pattern.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-29T00:09:54Z

+        _start_control_server(
+            gateway_url=f"http://{user_host}:{user_port}",
+            host=user_host,
+            port=control_port,
+        )
+        sys.stdout.write(
+            f"ts control server ready on http://{user_host}:{control_port}\n"
+        )


Verify the control server starts before declaring readiness

When the requested/default control port is already in use (or uvicorn fails to bind for any other reason), _start_control_server only launches a daemon thread and returns immediately, so this code still prints ts control server ready and the orchestrator continues with no working control plane. This affects deployments that pass --control-port and rely on this readiness line for automation; the startup path should synchronize with uvicorn binding or surface the failure before advertising the sidecar as ready.

Useful? React with 👍 / 👎.

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

…readiness only Signed-off-by: Qingyang Wu <willqywu@gmail.com>

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

…le stubs Signed-off-by: Qingyang Wu <willqywu@gmail.com>

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

…hCheck, Abort Signed-off-by: Qingyang Wu <willqywu@gmail.com>

…ouble-encoding Signed-off-by: Qingyang Wu <willqywu@gmail.com>

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

Review fixes for lightseekorg#305: - Reuse a single shared gRPC channel/stub instead of creating (and leaking) a new channel on every /get_server_info, /get_model_info, /health_check, /abort call. - Map grpc.aio.AioRpcError to a clean 503 JSON response instead of an unhandled 500 + stack trace. - Fix stale "Health (local)" comment (now proxied). Signed-off-by: Qingyang Wu <willqywu@gmail.com>

The `async with ClientSession()` block closed the session as soon as _proxy_request returned the StreamingResponse — but FastAPI consumes the body iterator afterward, so the upstream connection was closed mid-stream and streaming requests raised "Connection closed." (caught only against a real engine; a fast mock fully buffers before close and hides it). Now the session is created without a context manager and closed in the generator's finally (streaming) or after read() (non-streaming). Verified against a live `tokenspeed serve` (Qwen2.5-0.5B): SSE tokens stream incrementally through the sidecar with a proper data: [DONE] terminator; non-streaming, /health, and gRPC-direct endpoints all 200. Signed-off-by: Qingyang Wu <willqywu@gmail.com>

qywu · 2026-05-29T05:27:08Z

Superseded by #308 — same change squashed into a single commit, plus comprehensive regression tests (test/runtime/test_http_server.py) for the streaming session-lifetime, double-encoding, gRPC channel-reuse, and gRPC-error-mapping bugs found during development.

Address review feedback on lightseekorg#305/lightseekorg#308: the orchestrator printed "ts control server ready" right after spawning the uvicorn thread, before the socket was bound. If the port was in use the thread died silently and automation waiting on that line would hit a dead endpoint. http_server.build_server() now returns an unstarted uvicorn.Server, and _start_control_server() polls server.started (uvicorn sets it only after the socket binds), returning False if the thread dies or times out. The ready line is gated on success; a bind failure prints a WARNING and serving continues (the smg gateway is independent). Tests cover both the ready-after-bind and port-in-use paths. Signed-off-by: Qingyang Wu <willqywu@gmail.com>

qywu added 2 commits May 28, 2026 22:49

qywu changed the title ~~feat(entrypoints): add lightweight HTTP server (no smg gateway)~~ feat(entrypoints): add control-plane HTTP server sidecar (--control-port) May 28, 2026

qywu mentioned this pull request May 28, 2026

feat: expose POST /release_memory_occupation and /resume_memory_occupation #272

Closed

6 tasks

qywu marked this pull request as ready for review May 29, 2026 00:07

qywu requested a review from a team as a code owner May 29, 2026 00:07

qywu marked this pull request as draft May 29, 2026 00:07

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

chore(http-server): remove RL/weight-update endpoints for now

f7559d3

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

qywu force-pushed the feat/http-server branch from 7b63f30 to f7559d3 Compare May 29, 2026 00:12

qywu added 13 commits May 29, 2026 00:14

chore(http-server): strip unimplemented proxy endpoints, keep health/…

8d52288

…readiness only Signed-off-by: Qingyang Wu <willqywu@gmail.com>

chore(http-server): remove /readiness endpoint

6c3ee05

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

feat(http-server): proxy generation endpoints to smg

c002f06

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

feat(http-server): add flush_cache, get_server_info, start/stop_profi…

437b81a

…le stubs Signed-off-by: Qingyang Wu <willqywu@gmail.com>

feat(http-server): add passthrough for /v1/messages and /v1/responses

c5f680e

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

fix(http-server): proxy /flush_cache to smg instead of stub

d6150ab

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

feat(http-server): direct gRPC for GetServerInfo, GetModelInfo, Healt…

4d58e74

…hCheck, Abort Signed-off-by: Qingyang Wu <willqywu@gmail.com>

fix(http-server): use raw Response for non-streaming proxy to avoid d…

9154ccf

…ouble-encoding Signed-off-by: Qingyang Wu <willqywu@gmail.com>

feat(http-server): add start_profile/stop_profile smg passthrough

abe9c91

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

chore: remove test_http_server.py

013f263

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

feat(http-server): proxy /health to smg for real engine health status

62de7bd

Signed-off-by: Qingyang Wu <willqywu@gmail.com>

qywu mentioned this pull request May 29, 2026

feat(entrypoints): add HTTP server sidecar alongside smg gateway #308

Merged

qywu closed this May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(entrypoints): add control-plane HTTP server sidecar (--control-port)#305

feat(entrypoints): add control-plane HTTP server sidecar (--control-port)#305
qywu wants to merge 17 commits into
lightseekorg:mainfrom
qywu:feat/http-server

qywu commented May 28, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

qywu commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qywu commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Endpoints

Usage

Auto-starts on port+1 (no flag needed)

→ smg gateway on :8000, HTTP server on :8001

Override control port

Changes

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

qywu commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qywu commented May 28, 2026 •

edited

Loading