Improve proxy server #4354

lvhan028 · 2026-02-12T06:27:44Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

Copilot

Pull request overview

This PR refactors the proxy server implementation by extracting node management into a dedicated module, introducing shared aiohttp session management, and simplifying the proxy request handling flow for Hybrid vs DistServe serving modes.

Changes:

Split proxy node management logic into a new NodeManager/Node implementation with a shared aiohttp ClientSession.
Refactored proxy request handling into helper functions for Hybrid and DistServe flows (prefill/decode + PD connection).
Renamed OpenAI API server context from VariableInterface to ServerContext and updated type hints/references accordingly.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`lmdeploy/serve/proxy/proxy.py`	Refactors proxy request routing and DistServe flow; introduces `AppSettings` and new node API models.
`lmdeploy/serve/proxy/node_manager.py`	New module implementing shared aiohttp connector/session, `Node`, and `NodeManager` with routing strategies.
`lmdeploy/serve/openai/api_server.py`	Renames the global server context class to `ServerContext` and updates usages.
`lmdeploy/serve/openai/serving_generate.py`	Updates `check_request` typing to use `ServerContext`.
`lmdeploy/serve/openai/serving_completion.py`	Updates `check_request` typing to use `ServerContext`.
`lmdeploy/serve/openai/serving_chat_completion.py`	Updates `check_request` typing to use `ServerContext`.

Comments suppressed due to low confidence (1)

lmdeploy/serve/openai/api_server.py:59

Renaming VariableInterface to ServerContext is a breaking change for in-repo callers (e.g., tests/imports) and any downstream code importing VariableInterface. Consider providing a backwards-compatible alias (VariableInterface = ServerContext) with deprecation messaging to avoid breaking existing imports.

class ServerContext:
    """A IO interface maintaining variables."""
    async_engine: AsyncEngine = None
    api_keys: Optional[List[str]] = None
    request_hosts = []

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T06:33:16Z

lmdeploy/serve/proxy/proxy.py

    if api_keys is not None:
        if isinstance(api_keys, str):
            api_keys = api_keys.split(',')
-        from lmdeploy.serve.openai.api_server import VariableInterface
-        VariableInterface.api_keys = api_keys
+        app_settings.api_keys = api_keys


proxy() stores API keys in app_settings.api_keys, but the check_api_key dependency reads ServerContext.api_keys (from lmdeploy.serve.openai.api_server). As a result, proxy API key protection will be effectively disabled even when api_keys is provided. Set the keys on ServerContext.api_keys (or implement a proxy-local check_api_key that uses app_settings.api_keys).

Copilot · 2026-02-12T06:33:17Z

lmdeploy/serve/proxy/proxy.py

+        if node.status is None:
+            from lmdeploy.serve.openai.api_client import APIClient
+            node.status = Status(models=APIClient(api_server_url=node.url).available_models)
+        await node_manager.add(node.url, node.status)


add_node is an async endpoint but uses the synchronous APIClient (requests) to fetch /v1/models, which will block the event loop under load. Consider moving this call into a threadpool (run_in_threadpool) or switching to an async client (reusing the shared aiohttp session) to avoid proxy latency spikes.

Copilot · 2026-02-12T06:33:17Z

lmdeploy/serve/proxy/proxy.py

+async def terminate_node_all():
+    """Terminate all nodes."""
    try:
-        success = node_manager.terminate_all_nodes()
-        if not success:
-            return 'Failed to terminate all nodes'
-        return 'All nodes terminated successfully'
-    except:  # noqa
-        logger.error('Failed to terminate all nodes')
-        return 'Failed to terminate all nodes.'
+        await node_manager.terminate_all_nodes()
+        return JSONResponse(status_code=200, content={'message': 'All nodes terminated successfully'})


/nodes/terminate_all terminates nodes but does not deregister endpoints from pd_connection_pool. This can leave stale decode/prefill endpoints and connections behind. Consider deregistering each terminated node URL (or resetting the pool) after successful termination.

Copilot · 2026-02-12T06:33:17Z

lmdeploy/serve/proxy/node_manager.py

+    async def terminate_node(self, node_url: str):
+        """Terminate a node."""
+        if node_url not in self.nodes:
+            raise KeyError(f'Node {node_url} does not exist')
+
+        self.nodes.pop(node_url)
+        self._invalidate_nodes_cache()
+
+        session = await connector.get_session()
+        async with session.get(f'{node_url}/terminate', headers={'accept': 'application/json'}) as response:
+            if response.status != 200:
+                text = await response.text()
+                raise RuntimeError(f'Failed to terminate node {node_url}, status={response.status}, msg={text}')


terminate_node() pops the node and invalidates the cache but never calls connector.update(len(self.nodes)) (unlike add/remove). This means aiohttp connection limits may remain sized for the old host count. Update the connector limits after node removal/termination for consistent behavior.

Copilot · 2026-02-12T06:33:17Z

lmdeploy/serve/proxy/node_manager.py

+    def remove_stale_nodes_by_expiration(self):
+        """Remove stale nodes."""
+        headers = {'accept': 'application/json'}
+        to_be_deleted = [url for url in self.nodes.keys() if not self._check_node_health(url, headers)]
+        for node_url in to_be_deleted:
+            # Note: remove is async but we can't await here in sync method
+            # The node will be removed from dict, but async cleanup won't happen
+            if node_url in self.nodes:
+                self.nodes.pop(node_url)
+                self._invalidate_nodes_cache()
+            logger.info(f'Removed node {node_url} due to heart beat expiration')


remove_stale_nodes_by_expiration() mutates self.nodes but skips connector.update() (and any other async cleanup) due to being sync. This can leave the shared session sized for removed hosts and may keep stale resources around. Consider scheduling connector.update() onto the running event loop (e.g., via asyncio.run_coroutine_threadsafe) or making the heartbeat coroutine-based so cleanup can be awaited.

Copilot · 2026-02-12T06:33:17Z

lmdeploy/serve/proxy/proxy.py

+    # Cleanup
+    if not app_settings.dummy_prefill:
+        prefill_id = prefill_info.get('id')
+        if prefill_id:
+            app_settings.pd_connection_pool.unshelf_prefill_session((p_url, d_node.url), prefill_id)


unshelf_prefill_session() is called during cleanup even when handle_shelf is false / no shelf_prefill_session() was performed. PDConnectionPool.unshelf_prefill_session uses set.remove() and will raise KeyError, which can break DistServe chat requests. Only unshelf if the session was actually shelved (e.g., guard with handle_shelf or track whether shelving happened).

Copilot · 2026-02-12T06:33:18Z

lmdeploy/serve/proxy/proxy.py

+
+def report_model_not_found(content: str):
+    """Report model not found error."""
+    return JSONResponse(status_code=404, content={'error_code': ErrorCodes.MODEL_NOT_FOUND.value, 'error_msg': content})


report_model_not_found returns {error_code, error_msg}, while other proxy errors (e.g., API timeout) use {error_code, text} based on err_msg. This inconsistency is likely to break clients that parse the proxy error shape. Consider standardizing on the existing text field (or updating all error paths consistently).

Suggested change

return JSONResponse(status_code=404, content={'error_code': ErrorCodes.MODEL_NOT_FOUND.value, 'error_msg': content})

return JSONResponse(

status_code=404,

content={'error_code': ErrorCodes.MODEL_NOT_FOUND.value, 'text': content},

)

Copilot · 2026-02-12T06:33:18Z

lmdeploy/serve/proxy/proxy.py

+async def terminate_node(node: NodeModel):
+    """Terminate a node."""
    try:
-        node_url = node.url
-        success = node_manager.terminate_node(node_url)
-        if not success:
-            return f'Failed to terminate node {node_url}'
-        return 'Terminated successfully'
-    except:  # noqa
-        logger.error(f'Terminate node {node_url} failed.')
-        return 'Failed to terminate node {node_url}, please check the input url.'
+        await node_manager.terminate_node(node.url)
+        logger.info(f'Terminated node {node.url} successfully')
+        return JSONResponse(status_code=200, content={'message': 'Terminated successfully', 'url': node.url})


/nodes/terminate removes a node but does not deregister it from pd_connection_pool (unlike /nodes/remove). This can leave stale endpoints/links in PDConnectionPool and affect later connection management. Consider calling app_settings.pd_connection_pool.dereg_instance(node.url) here as well.

lvhan028 added 2 commits February 12, 2026 14:23

initial

50946ac

rename VariableInterface to ServerContext

48965cf

Copilot AI review requested due to automatic review settings February 12, 2026 06:27

lvhan028 added the improvement label Feb 12, 2026

Copilot started reviewing on behalf of lvhan028 February 12, 2026 06:28 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve proxy server #4354

Improve proxy server #4354

lvhan028 commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve proxy server #4354

Are you sure you want to change the base?

Improve proxy server #4354

Conversation

lvhan028 commented Feb 12, 2026

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant