Skip to content

Conversation

@agatons
Copy link
Contributor

@agatons agatons commented Jan 15, 2026

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for multiple Ollama instances to enable load balancing across multiple Ollama servers. The implementation uses a channel-based pooling mechanism to distribute requests across available instances.

Changes:

  • Refactored the Ollama client to support multiple URIs with a channel-based instance pool for concurrent request handling
  • Modified the CLI configuration to accept multiple Ollama URLs instead of a single URL
  • Added three new Qwen3 embedding models (0.6b, 4b, 8b variants)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
services/ollama/ollama.go Refactored to support multiple URIs with pooling mechanism and improved response body handling
services/ollama/llm.go Added instance selection logic for generation requests with enhanced logging
services/ollama/models.go Added three new Qwen3 embedding model variants
bellmand/bellamnd.go Updated CLI to accept multiple Ollama URLs via StringSliceFlag
Comments suppressed due to low confidence (1)

services/ollama/models.go:170

  • The three new Qwen3 embedding models (EmbedModel_qwen3_06b, EmbedModel_qwen3_4b, EmbedModel_qwen3_8b) are defined but not added to the EmbedModels map. This means they cannot be discovered or used by the application. Add entries for all three models to the EmbedModels map.
var EmbedModels = map[string]embed.Model{
	EmbedModel_nomic_embed_text.Name:        EmbedModel_nomic_embed_text,
	EmbedModel_mxbai_embed_large.Name:       EmbedModel_mxbai_embed_large,
	EmbedModel_paraphrase_multilingual.Name: EmbedModel_paraphrase_multilingual,
	EmbedModel_bge_large.Name:               EmbedModel_bge_large,
	EmbedModel_bge_m3.Name:                  EmbedModel_bge_m3,
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +71 to 83
// Acquire an available instance (blocks if all instances are busy)
instanceIdx := <-g.available
defer func() {
// Return the instance to the pool when done
g.available <- instanceIdx
}()

uri := g.uris[instanceIdx]
g.log("[embed] using instance", "index", instanceIdx, "uri", uri, "request", reqc)

var embeddings [][]float64
var tokenTotal int
for _, text := range request.Texts {
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instance is acquired once at the function level but multiple HTTP requests are made in the loop (lines 83-130). This holds the instance for the entire duration of processing all texts sequentially, preventing other goroutines from using it. Consider either: (1) moving the instance acquisition inside the loop to allow interleaving of requests across instances, or (2) keeping the current design if you want to ensure all texts in a single request use the same instance for consistency.

Copilot uses AI. Check for mistakes.
Comment on lines 110 to +118
if resp.StatusCode != http.StatusOK {
d, _ := io.ReadAll(resp.Body)
resp.Body.Close()
return nil, fmt.Errorf("unexpected status code, %d, %s", resp.StatusCode, string(d))
}

var respModel embedResponse
err = json.NewDecoder(resp.Body).Decode(&respModel)
resp.Body.Close()
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a missing check after resp.Body.Close() on line 118. If the JSON decoding on line 117 returns an error, the response body is still closed, but the error handling on line 120 continues without knowing if the close operation succeeded. While the current logic works, the response body could potentially be closed twice if an error occurs during JSON decoding and the function returns early.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants