-
Notifications
You must be signed in to change notification settings - Fork 131
Description
Problem
Nanocoder is local-first, but many of its defaults (prompt size, tool count, context management) are tuned for larger models. When using small local models (1B-8B parameters via Ollama), several things break down:
| Problem | Detail |
|---|---|
| Large system prompt | ~2,000 words (~14KB) before any conversation starts — eats 10-15% of a small model's context |
| 36+ tool definitions | XML fallback injects all tool schemas into the system prompt, adding ~7,000 tokens |
| Small context windows | Most small models have 4K-32K context — the system prompt + tools leave little room for actual work |
| Unreliable tool calling | Malformed XML leads to correction loops that waste tokens and context |
| Weak multi-step reasoning | Small models struggle to chain 5+ tool calls effectively |
The goal is to make Nanocoder work well with small local models — either standalone or in a hybrid setup alongside a frontier model.
Proposed Features
1. Slim System Prompt
Create a drastically reduced prompt (~200-300 words instead of ~2,000). Strip out philosophy, examples, and edge-case guidance. Small models perform better with short, direct instructions — they can't effectively use long prompts anyway.
A main-prompt-slim.md alongside the existing main-prompt.md, selected based on mode or model size.
2. Tool Subsetting
Instead of giving the model all 36+ tools, provide a focused subset. This is likely the single highest-impact change for small model usability.
Static tool profiles — predefined sets the user can select:
code-edit: read_file, string_replace, write_file, find_files, search_file_contents, execute_bashexplore: read_file, find_files, search_file_contents, list_directorygit: git_status, git_diff, git_log, git_add, git_commitminimal: read_file, string_replace, execute_bash
Dynamic tool selection — use keyword matching on the user's message to pick ~5-8 relevant tools per turn.
Progressive tool loading — start with read-only tools, unlock write tools after the model reads relevant files.
3. Single-Tool Mode
Force the model to call one tool at a time instead of attempting to chain multiple calls in one response. Small models are much more reliable when focused on one action per turn.
Implemented as a prompt instruction combined with validation that rejects multi-tool responses and asks the model to pick one.
4. Guided Workflows
Instead of open-ended "figure it out" prompting, provide step-by-step scaffolding:
- User says "fix this bug" → Nanocoder injects: "Step 1: Read the file. Step 2: Identify the issue. Step 3: Apply the fix."
- The model only handles one step at a time, reducing reasoning load
- Implemented as prompt templates per task type (fix, explain, refactor, etc.)
5. Frontier Model as Planner (Hybrid Mode)
Use a frontier model (via OpenRouter or other provider) for planning and a local small model for execution:
- Frontier model analyses the task and creates a step-by-step plan with specific tool calls
- Small model executes each step (or Nanocoder auto-executes the plan)
- Dramatically cheaper — frontier model called once for planning, small model handles the rest
- Could also use the frontier model as a fallback when the small model's XML is repeatedly malformed
6. Aggressive Context Management
Small models need tighter context management than the current defaults:
- Lower auto-compact threshold (40% instead of 60%)
- More aggressive compression (shorter summaries, smaller truncation limits)
- Auto-drop tool result contents after they've been consumed
- Sliding window that keeps only the last N messages at full fidelity
7. Simplified Tool Schemas
For XML fallback, current schemas include full descriptions, parameter types, and examples. For small models:
- One-line descriptions
- Skip optional parameters
- Remove examples from schemas
- Simpler parameter names
8. Prefill / Constrained Output
For XML tool calling, prefill the assistant response with the opening XML tag when the expected tool is known. For example, after asking the model to read a file, prefill with <read_file> so it only needs to fill in the parameters. Some providers support this via assistant message prefix.
9. Smart Retry with Simplification
When a small model fails (malformed XML, wrong tool), instead of sending the same error back:
- Simplify the available tools (remove irrelevant ones for the retry)
- Rephrase the instruction more directly
- Provide the specific XML template to fill in
- After N failures, offer to hand off to a frontier model
10. Task-Specific Micro-Agents
Pre-built prompt + tool combos for common tasks small models handle well:
- Read and explain — read_file only, no tool output needed
- Find and replace — search_file_contents + string_replace
- Run and fix — execute_bash + read_file + string_replace
User selects the micro-agent, or Nanocoder picks based on intent.
11. Auto-Detection
Automatically enable small model optimisations based on model name. If the model name contains size indicators like 1b, 3b, 7b, 8b, enable the mode without requiring manual configuration. Users can override this.
Configuration
{
"smallModelMode": {
"enabled": true,
"slimPrompt": true,
"toolProfile": "code-edit",
"maxToolsPerTurn": 1,
"aggressiveCompact": true,
"simplifiedSchemas": true,
"plannerModel": "openrouter/claude-sonnet-4-5"
}
}Or at the provider/model level:
{
"providers": [
{
"name": "ollama",
"baseUrl": "http://localhost:11434/v1",
"models": ["llama3.2:3b"],
"smallModelMode": {
"enabled": true,
"toolProfile": "minimal"
}
}
]
}Implementation Priority
Phase 1 — High impact, lower effort
- Slim system prompt
- Static tool profiles (tool subsetting)
- Single-tool mode
- Aggressive auto-compact defaults for small models
Phase 2 — High impact, higher effort
- Simplified tool schemas
- Smart retry with simplification
- Auto-detection by model name/size
Phase 3 — Architectural
- Frontier-as-planner hybrid mode
- Guided workflows with task classification
- Task-specific micro-agents
- Prefill / constrained output