feat(inference): multi-route proxy with alias-based model routing#618
Open
cosmicnet wants to merge 4 commits intoNVIDIA:mainfrom
Open
feat(inference): multi-route proxy with alias-based model routing#618cosmicnet wants to merge 4 commits intoNVIDIA:mainfrom
cosmicnet wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
Author
|
I have read the DCO document and I hereby sign the DCO. |
There was a problem hiding this comment.
Pull request overview
Adds multi-route inference proxying so sandboxes can route inference.local requests to multiple LLM backends by using a model alias in the request body.
Changes:
- Extends the inference proto + gateway storage to support multiple
(alias, provider_name, model_id)entries per route. - Adds alias-first route selection in the router and passes a
model_hintextracted from sandbox request bodies. - Expands sandbox L7 inference patterns and adds an Ollama provider profile + endpoint validation probe.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| proto/inference.proto | Adds InferenceModelEntry and models fields for multi-model inference config. |
| crates/openshell-server/src/inference.rs | Implements multi-model upsert + resolves each alias into separate ResolvedRoute entries. |
| crates/openshell-sandbox/src/proxy.rs | Extracts model from JSON body and forwards it as model_hint to the router. |
| crates/openshell-sandbox/src/l7/inference.rs | Adds Codex + Ollama native API patterns and tests. |
| crates/openshell-router/src/lib.rs | Adds select_route() and extends proxy APIs to accept model_hint. |
| crates/openshell-router/src/backend.rs | Adds Ollama validation probe and changes backend URL construction behavior. |
| crates/openshell-router/tests/backend_integration.rs | Updates tests for new proxy function signatures and /v1 endpoint expectations. |
| crates/openshell-core/src/inference.rs | Adds OLLAMA_PROFILE (protocols/base URL/config keys). |
| crates/openshell-cli/src/run.rs | Adds gateway_inference_set_multi() to send multi-model configs. |
| crates/openshell-cli/src/main.rs | Adds --model-alias ALIAS=PROVIDER/MODEL CLI flag and dispatch. |
| architecture/inference-routing.md | Documents alias-based route selection, new patterns, and multi-model route behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
af1748b to
ab71175
Compare
Add pattern detection, provider profile, and validation probe for Ollama's native /api/chat, /api/tags, and /api/show endpoints. Proxy changes (l7/inference.rs): - POST /api/chat -> ollama_chat protocol - GET /api/tags -> ollama_model_discovery protocol - POST /api/show -> ollama_model_discovery protocol Provider profile (openshell-core/inference.rs): - New 'ollama' provider type with default endpoint http://host.openshell.internal:11434 - Supports ollama_chat, ollama_model_discovery, and OpenAI-compatible protocols (openai_chat_completions, openai_completions, model_discovery) - Credential lookup via OLLAMA_API_KEY, base URL via OLLAMA_BASE_URL Validation (backend.rs): - Ollama validation probe sends minimal /api/chat request with stream:false Tests: 4 new tests for pattern detection (ollama chat, tags, show, and GET /api/chat rejection). Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
- Proto: add InferenceModelEntry message with alias/provider/model fields; add repeated models field to ClusterInferenceConfig, Set/Get request/response - Server: add upsert_multi_model_route() for storing multiple model entries under a single route slot; update resolve_route_by_name() to expand multi-model configs into per-alias ResolvedRoute entries - Router: add select_route() with alias-first, protocol-fallback strategy; add model_hint parameter to proxy_with_candidates() variants - Sandbox proxy: extract model field from JSON body as routing hint - Tests: 7 new tests covering select_route, multi-model resolution, and bundle expansion; all 291 existing tests continue to pass Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
- Add --model-alias flag to 'inference set' for multi-model config (e.g. --model-alias gpt=openai/gpt-4 --model-alias claude=anthropic/claude-sonnet-4-20250514) - Add gateway_inference_set_multi() handler in run.rs - Update inference get/print to display multi-model entries - Import InferenceModelEntry proto type in CLI - Fix build_backend_url to always strip /v1 prefix for codex paths - Add /v1/codex/* inference pattern for openai_responses protocol - Fix backend tests to use /v1 endpoint suffix Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
…te guard - Add timeout_secs parameter to gateway_inference_set_multi and pass through to SetClusterInferenceRequest - Add print_timeout to multi-model output display - Add timeout field to router test helper make_route (upstream added timeout to ResolvedRoute) - Add system route guard: upsert_multi_model_route rejects route_name == sandbox-system with InvalidArgument - Add timeout_secs: 0 to multi-model test ClusterInferenceConfig structs - Add upsert_multi_model_route_rejects_system_route test Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
ab71175 to
d887f04
Compare
Author
|
@pimlock Happy to address any feedback or questions. Let me know if you'd like anything restructured or split differently. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds multi-route inference proxy support, allowing sandboxed agents to reach multiple LLM providers (OpenAI, Anthropic, NVIDIA, Ollama) through a single
inference.localendpoint. Agents select a backend by setting themodelfield to an alias name. Also adds Ollama native API support and Codex URL pattern matching.Related Issue
Closes #203
Changes
InferenceModelEntrymessage (alias,provider_name,model_id); addmodelsrepeated field to set/get request/response messagesupsert_multi_model_route()validates and stores multiple alias→provider mappings; resolves each entry into a separateResolvedRouteat bundle timeselect_route()implements alias-first, protocol-fallback selection;proxy_with_candidates/proxy_with_candidates_streamingaccept optionalmodel_hintmodelfield from request body asmodel_hintfor route selection/v1/codex/*,/api/chat,/api/tags,/api/showinference patternsbuild_backend_url()always strips/v1prefix to support both versioned and non-versioned endpoints (e.g. Codex)OLLAMA_PROFILEprovider profile with native + OpenAI-compat protocols--model-alias ALIAS=PROVIDER/MODELflag (repeatable, conflicts with--provider/--model)inference-routing.mdwith all new sectionsTesting
mise run pre-commitpassesChecklist