Skip to content

feat(grpc): support /get_server_info for TokenSpeed gRPC routers#1574

Draft
YzXiao101 wants to merge 1 commit into
lightseekorg:mainfrom
YzXiao101:feat/grpc-get-server-info
Draft

feat(grpc): support /get_server_info for TokenSpeed gRPC routers#1574
YzXiao101 wants to merge 1 commit into
lightseekorg:mainfrom
YzXiao101:feat/grpc-get-server-info

Conversation

@YzXiao101
Copy link
Copy Markdown

@YzXiao101 YzXiao101 commented May 31, 2026

Description

/get_server_info was available on the SMG side, but TokenSpeed gRPC routers did not implement the bridge, so the request could not return backend server info.

This tiny PR just fills that gap. It wires /get_server_info through the gRPC router/client path and returns the TokenSpeed payload in SMG's admin response.

Changes

  • add get_server_info to the regular gRPC router
  • add get_server_info to the PD gRPC router
  • bridge TokenSpeed server_args / scheduler_info from protobuf Struct to JSON
  • keep only the public server_args subset
  • update the admin API doc

Validation

Unit tests:

  • added Rust tests for TokenSpeed and SGLang Struct -> JSON conversion

Setup:

Sun May 31 15:12:24 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 NVL                On  |   00000000:C1:00.0 Off |                    0 |
| N/A   35C    P0             89W /  310W |    6039MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Result:

curl -fsS http://127.0.0.1:30000/get_server_info | python -m json.tool://127.0.0.1:30000/get_server_info | python -m json.tool
{
    "server_args": {
        "model": "unsloth/Llama-3.2-1B-Instruct",
        "served_model_name": "llama-3.2-1b-instruct",
        "tokenizer": "unsloth/Llama-3.2-1B-Instruct",
        "max_total_tokens": 8192.0,
        "load_balance_method": "shortest_queue"
    },
    "scheduler_info": {
        "chunked_prefill_size": 8192.0,
        "max_model_len": 4096.0,
        "max_num_seqs": 160.0,
        "max_req_input_len": 4095.0,
        "max_total_num_tokens": 8192.0,
        "status": "ready"
    },
    "active_requests": 0,
    "is_paused": false,
    "uptime_seconds": 113.65587878227234,
    "max_total_num_tokens": 8192,
    "tokenspeed_version": "unknown",
    "start_time": {
        "seconds": 1780239400,
        "nanos": 0
    }
}

NOTE: I have identified and fixed the JSON formatting issue where integer-like values in server_args and scheduler_info were being emitted as floats (for example, 8192.0), and I am running the full SMG + TokenSpeed validation again.

Summary by CodeRabbit

  • New Features

    • Enhanced the Get Server Info API to return richer server metrics (scheduler info, active requests, uptime, version/tokenspeed, start time, max tokens) with backend-specific response shapes.
  • Documentation

    • Updated API docs to describe the expanded response, differences for HTTP vs gRPC runtimes, and curated server_args exposure with an updated example.

@github-actions github-actions Bot added documentation Improvements or additions to documentation grpc gRPC client and router changes model-gateway Model gateway crate changes labels May 31, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a complete get_server_info endpoint across gRPC routers: ServerInfo protobuf-to-JSON conversion with backend-aware filtering, handler logic in GrpcRouter and GrpcPDRouter with worker selection and error mapping, and updated admin API docs showing the new response shape.

Changes

Server Info Endpoint Implementation

Layer / File(s) Summary
ServerInfo JSON serialization with filtering
model_gateway/src/routers/grpc/client.rs
ServerInfo::to_public_json() converts protobuf to curated JSON, filtering server_args by allowlist for SGLang/TokenSpeed, adding scheduler/uptime/version/active_requests/start_time/max_total_num_tokens fields, and updating flat_labels. Includes unit tests.
GrpcRouter get_server_info handler
model_gateway/src/routers/grpc/router.rs
Adds select_first_worker and get_server_info_impl to pick a regular gRPC worker, acquire a client, call get_server_info, convert to public JSON, and map/log errors; adjusts imports and Json usage.
GrpcPDRouter get_server_info handler
model_gateway/src/routers/grpc/pd_router.rs
Adds helpers to choose the first prefill gRPC worker and call get_server_info with error logging/mapping for missing worker, unconfigured client, client acquisition failure, and gRPC call errors; implements RouterTrait::get_server_info.
API documentation update
docs/reference/api/admin.md
Replaces generic example with backend-specific response description including server_args, scheduler_info, and runtime state/timing fields (active_requests, is_paused, uptime_seconds, start_time).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant GrpcRouter
  participant WorkerPool
  participant GrpcClient
  participant ServerInfo
  Client->>GrpcRouter: GET /server_info
  GrpcRouter->>WorkerPool: select_first_worker
  alt Worker Available
    WorkerPool-->>GrpcRouter: worker
    GrpcRouter->>GrpcClient: acquire client
    GrpcClient->>ServerInfo: get_server_info()
    ServerInfo-->>GrpcClient: proto response
    GrpcClient-->>GrpcRouter: response
    GrpcRouter->>GrpcRouter: to_public_json()
    GrpcRouter-->>Client: 200 JSON
  else No Worker
    GrpcRouter-->>Client: 503 Service Unavailable
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lightseekorg/smg#817: Adds the max_total_num_tokens field to GetServerInfoResponse protobuf message, which is now included in the public JSON response via the new ServerInfo::to_public_json() method.

Suggested labels

tests

Suggested reviewers

  • CatherineSue
  • key4ng
  • slin1237

Poem

🐰 A rabbit hops through router lanes,
Filtering secrets from server chains,
Protos melt to JSON streams,
Workers wake and answer dreams,
Server info now gleams! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the pull request—adding /get_server_info endpoint support for TokenSpeed gRPC routers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 31, 2026

Hi @YzXiao101, the DCO sign-off check has failed. All commits must include a Signed-off-by line.

To fix existing commits:

# Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-lease

To sign off future commits automatically:

  • Use git commit -s every time, or
  • VSCode: enable Git: Always Sign Off in Settings
  • PyCharm: enable Sign-off commit in the Commit tool window

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the get_server_info endpoint for gRPC routers (GrpcRouter and GrpcPDRouter), allowing them to fetch and return server information from backend workers. It introduces a to_public_json method to format and filter the gRPC server arguments into a curated JSON response, and updates the API documentation and tests accordingly. The review feedback suggests a performance and idiomatic improvement in struct_to_json_map by removing redundant sorting of BTreeMap fields and using .collect() instead of manual loop insertion.

Comment on lines +758 to +767
fn struct_to_json_map(struct_value: &prost_types::Struct) -> serde_json::Map<String, Value> {
let mut entries: Vec<_> = struct_value.fields.iter().collect();
entries.sort_by(|(left, _), (right, _)| left.cmp(right));

let mut map = serde_json::Map::new();
for (key, value) in entries {
map.insert(key.clone(), prost_value_to_json(value));
}
map
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The struct_value.fields field is a BTreeMap<String, Value> in prost_types::Struct, which means its keys are already sorted. Collecting the entries into a Vec and sorting them again is redundant and introduces unnecessary heap allocations and CPU overhead. You can iterate over struct_value.fields directly. Additionally, prefer using .collect() on the iterator to populate the map instead of manually looping to insert elements, as it is more idiomatic and concise.

        fn struct_to_json_map(struct_value: &prost_types::Struct) -> serde_json::Map<String, Value> {
            struct_value
                .fields
                .iter()
                .map(|(key, value)| (key.clone(), prost_value_to_json(value)))
                .collect()
        }
References
  1. Prefer using .collect() on an iterator to populate a collection instead of manually looping to insert elements, as it automatically pre-sizes the collection and is more idiomatic and concise.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/reference/api/admin.md`:
- Around line 393-394: Add a blank line between the "**Response:** `200 OK`"
line and the opening JSON code fence (```) so the code fence is separated from
the preceding bold line; this resolves markdownlint MD031 for the block that
starts with the JSON fence and ensures the snippet under the "**Response:** `200
OK`" heading renders and lints correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: da2410a0-9930-4df3-ab44-07dde805fb8f

📥 Commits

Reviewing files that changed from the base of the PR and between f4597b3 and a466a0a.

📒 Files selected for processing (4)
  • docs/reference/api/admin.md
  • model_gateway/src/routers/grpc/client.rs
  • model_gateway/src/routers/grpc/pd_router.rs
  • model_gateway/src/routers/grpc/router.rs

Comment thread docs/reference/api/admin.md
Signed-off-by: YzXiao101 <yzxiao101@outlook.com>
@YzXiao101 YzXiao101 force-pushed the feat/grpc-get-server-info branch from a466a0a to 53594a5 Compare May 31, 2026 16:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53594a5570

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

fn to_public_json_sglang_keeps_pb_shape_and_filters_server_args() {
let server_info = ServerInfo::Sglang(Box::new(sglang_proto::GetServerInfoResponse {
server_args: Some(prost_types::Struct {
fields: BTreeMap::from([
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use the prost Struct map type

In prost-types 0.14, prost_types::Struct::fields is a HashMap<String, Value> by default, but these new tests initialize it with BTreeMap::from(...). When the test module is compiled, this assignment won't type-check, so cargo test for this crate fails before running the new coverage. Use HashMap here (and in the other new Struct initializers) or collect into the field's actual map type.

Useful? React with 👍 / 👎.

@slin1237
Copy link
Copy Markdown
Collaborator

This is not the right implementation. Get server info should not be a new point. It's already abstracted in worker

Copy link
Copy Markdown
Collaborator

@slin1237 slin1237 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Router shouldn't be changed for this
Worker is the right abstraction. Not router

@YzXiao101 YzXiao101 marked this pull request as draft June 1, 2026 13:15
@YzXiao101
Copy link
Copy Markdown
Author

Router shouldn't be changed for this Worker is the right abstraction. Not router

@slin1237 thx for your feedback.

  1. Does it mean /get_server_info should be implemented at the WorkerManager / worker layer instead of the router, similar to /get_loads?
  2. Just to confirm what is the intended definition of /get_server_info in multi-worker or multi-backend setups? Should it remain a single selected backend’s server info (i.e., the current http path), or should it mean something else? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation grpc gRPC client and router changes model-gateway Model gateway crate changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants