-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Local LLM support via Ollama for plan execution #145
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add support for using local LLMs (via Ollama) as an alternative to Claude Haiku for executing development plans. This enables:
- Zero API costs after hardware investment
- Privacy — code never leaves the machine
- Offline execution — no internet required
Motivation
Models like Qwen3-Coder-Next-80B now rival Claude on coding benchmarks and can run locally on Apple Silicon Macs with 64GB+ unified memory. For teams with suitable hardware, this eliminates per-token costs entirely.
Proposed Implementation
1. Ollama-compatible executor agent
- Use Ollama's OpenAI-compatible API (
localhost:11434/v1/chat/completions) - New executor template that works with local models
- Configurable model selection (qwen3-coder-next, codellama, deepseek-coder, etc.)
2. Configuration options
{
"executor": {
"provider": "ollama",
"model": "qwen3-coder-next",
"baseUrl": "http://localhost:11434",
"contextWindow": 128000
}
}3. Prompt tuning
- May need DevPlan-specific system prompts optimized for open models
- Test and document which models work best with DevPlan format
Hardware Requirements
| Model | Min RAM | Speed (M4 Pro) |
|---|---|---|
| Qwen3-Coder-Next-80B (Q4) | 64GB | ~10-15 tok/s |
| DeepSeek-Coder-33B (Q4) | 24GB | ~25-30 tok/s |
| CodeLlama-34B (Q4) | 24GB | ~25-30 tok/s |
Success Criteria
- Executor agent can use Ollama as backend
- At least one model (Qwen3-Coder-Next) tested end-to-end with a sample plan
- Documentation for local LLM setup
- Performance comparison vs Haiku (speed, quality)
Related
- Ollama OpenAI compatibility: https://ollama.com/blog/openai-compatibility
- Qwen3-Coder-Next: https://ollama.com/library/qwen3-coder-next
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request