feat(grpc): support /get_server_info for TokenSpeed gRPC routers by YzXiao101 · Pull Request #1574 · lightseekorg/smg

YzXiao101 · 2026-05-31T16:38:50Z

Description

/get_server_info was available on the SMG side, but TokenSpeed gRPC routers did not implement the bridge, so the request could not return backend server info.

This tiny PR just fills that gap. It wires /get_server_info through the gRPC router/client path and returns the TokenSpeed payload in SMG's admin response.

Changes

add get_server_info to the regular gRPC router
add get_server_info to the PD gRPC router
bridge TokenSpeed server_args / scheduler_info from protobuf Struct to JSON
keep only the public server_args subset
update the admin API doc

Validation

Unit tests:

added Rust tests for TokenSpeed and SGLang Struct -> JSON conversion

Setup:

Sun May 31 15:12:24 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 NVL                On  |   00000000:C1:00.0 Off |                    0 |
| N/A   35C    P0             89W /  310W |    6039MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Result:

curl -fsS http://127.0.0.1:30000/get_server_info | python -m json.tool://127.0.0.1:30000/get_server_info | python -m json.tool
{
    "server_args": {
        "model": "unsloth/Llama-3.2-1B-Instruct",
        "served_model_name": "llama-3.2-1b-instruct",
        "tokenizer": "unsloth/Llama-3.2-1B-Instruct",
        "max_total_tokens": 8192.0,
        "load_balance_method": "shortest_queue"
    },
    "scheduler_info": {
        "chunked_prefill_size": 8192.0,
        "max_model_len": 4096.0,
        "max_num_seqs": 160.0,
        "max_req_input_len": 4095.0,
        "max_total_num_tokens": 8192.0,
        "status": "ready"
    },
    "active_requests": 0,
    "is_paused": false,
    "uptime_seconds": 113.65587878227234,
    "max_total_num_tokens": 8192,
    "tokenspeed_version": "unknown",
    "start_time": {
        "seconds": 1780239400,
        "nanos": 0
    }
}

NOTE: I have identified and fixed the JSON formatting issue where integer-like values in server_args and scheduler_info were being emitted as floats (for example, 8192.0), and I am running the full SMG + TokenSpeed validation again.

Summary by CodeRabbit

New Features
- Enhanced the Get Server Info API to return richer server metrics (scheduler info, active requests, uptime, version/tokenspeed, start time, max tokens) with backend-specific response shapes.
Documentation
- Updated API docs to describe the expanded response, differences for HTTP vs gRPC runtimes, and curated server_args exposure with an updated example.

coderabbitai · 2026-05-31T16:39:05Z

📝 Walkthrough

Walkthrough

This PR implements a complete get_server_info endpoint across gRPC routers: ServerInfo protobuf-to-JSON conversion with backend-aware filtering, handler logic in GrpcRouter and GrpcPDRouter with worker selection and error mapping, and updated admin API docs showing the new response shape.

Changes

Server Info Endpoint Implementation

Layer / File(s)	Summary
ServerInfo JSON serialization with filtering `model_gateway/src/routers/grpc/client.rs`	`ServerInfo::to_public_json()` converts protobuf to curated JSON, filtering `server_args` by allowlist for SGLang/TokenSpeed, adding scheduler/uptime/version/active_requests/start_time/max_total_num_tokens fields, and updating `flat_labels`. Includes unit tests.
GrpcRouter get_server_info handler `model_gateway/src/routers/grpc/router.rs`	Adds `select_first_worker` and `get_server_info_impl` to pick a regular gRPC worker, acquire a client, call `get_server_info`, convert to public JSON, and map/log errors; adjusts imports and `Json` usage.
GrpcPDRouter get_server_info handler `model_gateway/src/routers/grpc/pd_router.rs`	Adds helpers to choose the first prefill gRPC worker and call `get_server_info` with error logging/mapping for missing worker, unconfigured client, client acquisition failure, and gRPC call errors; implements `RouterTrait::get_server_info`.
API documentation update `docs/reference/api/admin.md`	Replaces generic example with backend-specific response description including `server_args`, `scheduler_info`, and runtime state/timing fields (`active_requests`, `is_paused`, `uptime_seconds`, `start_time`).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant GrpcRouter
  participant WorkerPool
  participant GrpcClient
  participant ServerInfo
  Client->>GrpcRouter: GET /server_info
  GrpcRouter->>WorkerPool: select_first_worker
  alt Worker Available
    WorkerPool-->>GrpcRouter: worker
    GrpcRouter->>GrpcClient: acquire client
    GrpcClient->>ServerInfo: get_server_info()
    ServerInfo-->>GrpcClient: proto response
    GrpcClient-->>GrpcRouter: response
    GrpcRouter->>GrpcRouter: to_public_json()
    GrpcRouter-->>Client: 200 JSON
  else No Worker
    GrpcRouter-->>Client: 503 Service Unavailable
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

lightseekorg/smg#817: Adds the max_total_num_tokens field to GetServerInfoResponse protobuf message, which is now included in the public JSON response via the new ServerInfo::to_public_json() method.

Suggested labels

tests

Suggested reviewers

CatherineSue
key4ng
slin1237

Poem

🐰 A rabbit hops through router lanes,
Filtering secrets from server chains,
Protos melt to JSON streams,
Workers wake and answer dreams,
Server info now gleams! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.07% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the pull request—adding /get_server_info endpoint support for TokenSpeed gRPC routers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mergify · 2026-05-31T16:39:32Z

Hi @YzXiao101, the DCO sign-off check has failed. All commits must include a Signed-off-by line.

To fix existing commits:

# Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-lease

To sign off future commits automatically:

Use git commit -s every time, or
VSCode: enable Git: Always Sign Off in Settings
PyCharm: enable Sign-off commit in the Commit tool window

gemini-code-assist

Code Review

This pull request implements the get_server_info endpoint for gRPC routers (GrpcRouter and GrpcPDRouter), allowing them to fetch and return server information from backend workers. It introduces a to_public_json method to format and filter the gRPC server arguments into a curated JSON response, and updates the API documentation and tests accordingly. The review feedback suggests a performance and idiomatic improvement in struct_to_json_map by removing redundant sorting of BTreeMap fields and using .collect() instead of manual loop insertion.

gemini-code-assist · 2026-05-31T16:40:05Z

+        fn struct_to_json_map(struct_value: &prost_types::Struct) -> serde_json::Map<String, Value> {
+            let mut entries: Vec<_> = struct_value.fields.iter().collect();
+            entries.sort_by(|(left, _), (right, _)| left.cmp(right));
+
+            let mut map = serde_json::Map::new();
+            for (key, value) in entries {
+                map.insert(key.clone(), prost_value_to_json(value));
+            }
+            map
+        }


The struct_value.fields field is a BTreeMap<String, Value> in prost_types::Struct, which means its keys are already sorted. Collecting the entries into a Vec and sorting them again is redundant and introduces unnecessary heap allocations and CPU overhead. You can iterate over struct_value.fields directly. Additionally, prefer using .collect() on the iterator to populate the map instead of manually looping to insert elements, as it is more idiomatic and concise.

fn struct_to_json_map(struct_value: &prost_types::Struct) -> serde_json::Map<String, Value> { struct_value .fields .iter() .map(|(key, value)| (key.clone(), prost_value_to_json(value))) .collect() }

References

Prefer using .collect() on an iterator to populate a collection instead of manually looping to insert elements, as it automatically pre-sizes the collection and is more idiomatic and concise.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/reference/api/admin.md`:
- Around line 393-394: Add a blank line between the "**Response:** `200 OK`"
line and the opening JSON code fence (```) so the code fence is separated from
the preceding bold line; this resolves markdownlint MD031 for the block that
starts with the JSON fence and ensures the snippet under the "**Response:** `200
OK`" heading renders and lints correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: da2410a0-9930-4df3-ab44-07dde805fb8f

📥 Commits

Reviewing files that changed from the base of the PR and between f4597b3 and a466a0a.

📒 Files selected for processing (4)

docs/reference/api/admin.md
model_gateway/src/routers/grpc/client.rs
model_gateway/src/routers/grpc/pd_router.rs
model_gateway/src/routers/grpc/router.rs

Signed-off-by: YzXiao101 <yzxiao101@outlook.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53594a5570

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T17:01:33Z

+    fn to_public_json_sglang_keeps_pb_shape_and_filters_server_args() {
+        let server_info = ServerInfo::Sglang(Box::new(sglang_proto::GetServerInfoResponse {
+            server_args: Some(prost_types::Struct {
+                fields: BTreeMap::from([


Use the prost Struct map type

In prost-types 0.14, prost_types::Struct::fields is a HashMap<String, Value> by default, but these new tests initialize it with BTreeMap::from(...). When the test module is compiled, this assignment won't type-check, so cargo test for this crate fails before running the new coverage. Use HashMap here (and in the other new Struct initializers) or collect into the field's actual map type.

Useful? React with 👍 / 👎.

slin1237 · 2026-05-31T23:04:32Z

This is not the right implementation. Get server info should not be a new point. It's already abstracted in worker

slin1237

Router shouldn't be changed for this
Worker is the right abstraction. Not router

YzXiao101 · 2026-06-01T16:58:51Z

Router shouldn't be changed for this Worker is the right abstraction. Not router

@slin1237 thx for your feedback.

Does it mean /get_server_info should be implemented at the WorkerManager / worker layer instead of the router, similar to /get_loads?
Just to confirm what is the intended definition of /get_server_info in multi-worker or multi-backend setups? Should it remain a single selected backend’s server info (i.e., the current http path), or should it mean something else? 🤔

YzXiao101 requested review from CatherineSue, key4ng and slin1237 as code owners May 31, 2026 16:38

github-actions Bot added documentation Improvements or additions to documentation grpc gRPC client and router changes model-gateway Model gateway crate changes labels May 31, 2026

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

coderabbitai Bot requested changes May 31, 2026

View reviewed changes

Comment thread docs/reference/api/admin.md

feat(grpc): bridge gRPC get_server_info

53594a5

Signed-off-by: YzXiao101 <yzxiao101@outlook.com>

YzXiao101 force-pushed the feat/grpc-get-server-info branch from a466a0a to 53594a5 Compare May 31, 2026 16:58

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

coderabbitai Bot approved these changes May 31, 2026

View reviewed changes

lightseek-bot assigned CatherineSue and slin1237 May 31, 2026

slin1237 requested changes May 31, 2026

View reviewed changes

YzXiao101 marked this pull request as draft June 1, 2026 13:15

Conversation

YzXiao101 commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

mergify Bot commented May 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

slin1237 commented May 31, 2026

Uh oh!

slin1237 left a comment

Choose a reason for hiding this comment

Uh oh!

YzXiao101 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YzXiao101 commented May 31, 2026 •

edited

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading