-
Notifications
You must be signed in to change notification settings - Fork 3
adding support for multiple ollama #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for multiple Ollama instances to enable load balancing across multiple Ollama servers. The implementation uses a channel-based pooling mechanism to distribute requests across available instances.
Changes:
- Refactored the Ollama client to support multiple URIs with a channel-based instance pool for concurrent request handling
- Modified the CLI configuration to accept multiple Ollama URLs instead of a single URL
- Added three new Qwen3 embedding models (0.6b, 4b, 8b variants)
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| services/ollama/ollama.go | Refactored to support multiple URIs with pooling mechanism and improved response body handling |
| services/ollama/llm.go | Added instance selection logic for generation requests with enhanced logging |
| services/ollama/models.go | Added three new Qwen3 embedding model variants |
| bellmand/bellamnd.go | Updated CLI to accept multiple Ollama URLs via StringSliceFlag |
Comments suppressed due to low confidence (1)
services/ollama/models.go:170
- The three new Qwen3 embedding models (EmbedModel_qwen3_06b, EmbedModel_qwen3_4b, EmbedModel_qwen3_8b) are defined but not added to the EmbedModels map. This means they cannot be discovered or used by the application. Add entries for all three models to the EmbedModels map.
var EmbedModels = map[string]embed.Model{
EmbedModel_nomic_embed_text.Name: EmbedModel_nomic_embed_text,
EmbedModel_mxbai_embed_large.Name: EmbedModel_mxbai_embed_large,
EmbedModel_paraphrase_multilingual.Name: EmbedModel_paraphrase_multilingual,
EmbedModel_bge_large.Name: EmbedModel_bge_large,
EmbedModel_bge_m3.Name: EmbedModel_bge_m3,
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Acquire an available instance (blocks if all instances are busy) | ||
| instanceIdx := <-g.available | ||
| defer func() { | ||
| // Return the instance to the pool when done | ||
| g.available <- instanceIdx | ||
| }() | ||
|
|
||
| uri := g.uris[instanceIdx] | ||
| g.log("[embed] using instance", "index", instanceIdx, "uri", uri, "request", reqc) | ||
|
|
||
| var embeddings [][]float64 | ||
| var tokenTotal int | ||
| for _, text := range request.Texts { |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance is acquired once at the function level but multiple HTTP requests are made in the loop (lines 83-130). This holds the instance for the entire duration of processing all texts sequentially, preventing other goroutines from using it. Consider either: (1) moving the instance acquisition inside the loop to allow interleaving of requests across instances, or (2) keeping the current design if you want to ensure all texts in a single request use the same instance for consistency.
| if resp.StatusCode != http.StatusOK { | ||
| d, _ := io.ReadAll(resp.Body) | ||
| resp.Body.Close() | ||
| return nil, fmt.Errorf("unexpected status code, %d, %s", resp.StatusCode, string(d)) | ||
| } | ||
|
|
||
| var respModel embedResponse | ||
| err = json.NewDecoder(resp.Body).Decode(&respModel) | ||
| resp.Body.Close() |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a missing check after resp.Body.Close() on line 118. If the JSON decoding on line 117 returns an error, the response body is still closed, but the error handling on line 120 continues without knowing if the close operation succeeded. While the current logic works, the response body could potentially be closed twice if an error occurs during JSON decoding and the function returns early.
No description provided.