Skip to content

feat: Local LLM support via Ollama for plan execution #145

@mmorris35

Description

@mmorris35

Summary

Add support for using local LLMs (via Ollama) as an alternative to Claude Haiku for executing development plans. This enables:

  • Zero API costs after hardware investment
  • Privacy — code never leaves the machine
  • Offline execution — no internet required

Motivation

Models like Qwen3-Coder-Next-80B now rival Claude on coding benchmarks and can run locally on Apple Silicon Macs with 64GB+ unified memory. For teams with suitable hardware, this eliminates per-token costs entirely.

Proposed Implementation

1. Ollama-compatible executor agent

  • Use Ollama's OpenAI-compatible API (localhost:11434/v1/chat/completions)
  • New executor template that works with local models
  • Configurable model selection (qwen3-coder-next, codellama, deepseek-coder, etc.)

2. Configuration options

{
  "executor": {
    "provider": "ollama",
    "model": "qwen3-coder-next",
    "baseUrl": "http://localhost:11434",
    "contextWindow": 128000
  }
}

3. Prompt tuning

  • May need DevPlan-specific system prompts optimized for open models
  • Test and document which models work best with DevPlan format

Hardware Requirements

Model Min RAM Speed (M4 Pro)
Qwen3-Coder-Next-80B (Q4) 64GB ~10-15 tok/s
DeepSeek-Coder-33B (Q4) 24GB ~25-30 tok/s
CodeLlama-34B (Q4) 24GB ~25-30 tok/s

Success Criteria

  • Executor agent can use Ollama as backend
  • At least one model (Qwen3-Coder-Next) tested end-to-end with a sample plan
  • Documentation for local LLM setup
  • Performance comparison vs Haiku (speed, quality)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions