Skip to content

[Feature] Small Model Mode #391

@will-lamerton

Description

@will-lamerton

Problem

Nanocoder is local-first, but many of its defaults (prompt size, tool count, context management) are tuned for larger models. When using small local models (1B-8B parameters via Ollama), several things break down:

Problem Detail
Large system prompt ~2,000 words (~14KB) before any conversation starts — eats 10-15% of a small model's context
36+ tool definitions XML fallback injects all tool schemas into the system prompt, adding ~7,000 tokens
Small context windows Most small models have 4K-32K context — the system prompt + tools leave little room for actual work
Unreliable tool calling Malformed XML leads to correction loops that waste tokens and context
Weak multi-step reasoning Small models struggle to chain 5+ tool calls effectively

The goal is to make Nanocoder work well with small local models — either standalone or in a hybrid setup alongside a frontier model.


Proposed Features

1. Slim System Prompt

Create a drastically reduced prompt (~200-300 words instead of ~2,000). Strip out philosophy, examples, and edge-case guidance. Small models perform better with short, direct instructions — they can't effectively use long prompts anyway.

A main-prompt-slim.md alongside the existing main-prompt.md, selected based on mode or model size.

2. Tool Subsetting

Instead of giving the model all 36+ tools, provide a focused subset. This is likely the single highest-impact change for small model usability.

Static tool profiles — predefined sets the user can select:

  • code-edit: read_file, string_replace, write_file, find_files, search_file_contents, execute_bash
  • explore: read_file, find_files, search_file_contents, list_directory
  • git: git_status, git_diff, git_log, git_add, git_commit
  • minimal: read_file, string_replace, execute_bash

Dynamic tool selection — use keyword matching on the user's message to pick ~5-8 relevant tools per turn.

Progressive tool loading — start with read-only tools, unlock write tools after the model reads relevant files.

3. Single-Tool Mode

Force the model to call one tool at a time instead of attempting to chain multiple calls in one response. Small models are much more reliable when focused on one action per turn.

Implemented as a prompt instruction combined with validation that rejects multi-tool responses and asks the model to pick one.

4. Guided Workflows

Instead of open-ended "figure it out" prompting, provide step-by-step scaffolding:

  • User says "fix this bug" → Nanocoder injects: "Step 1: Read the file. Step 2: Identify the issue. Step 3: Apply the fix."
  • The model only handles one step at a time, reducing reasoning load
  • Implemented as prompt templates per task type (fix, explain, refactor, etc.)

5. Frontier Model as Planner (Hybrid Mode)

Use a frontier model (via OpenRouter or other provider) for planning and a local small model for execution:

  • Frontier model analyses the task and creates a step-by-step plan with specific tool calls
  • Small model executes each step (or Nanocoder auto-executes the plan)
  • Dramatically cheaper — frontier model called once for planning, small model handles the rest
  • Could also use the frontier model as a fallback when the small model's XML is repeatedly malformed

6. Aggressive Context Management

Small models need tighter context management than the current defaults:

  • Lower auto-compact threshold (40% instead of 60%)
  • More aggressive compression (shorter summaries, smaller truncation limits)
  • Auto-drop tool result contents after they've been consumed
  • Sliding window that keeps only the last N messages at full fidelity

7. Simplified Tool Schemas

For XML fallback, current schemas include full descriptions, parameter types, and examples. For small models:

  • One-line descriptions
  • Skip optional parameters
  • Remove examples from schemas
  • Simpler parameter names

8. Prefill / Constrained Output

For XML tool calling, prefill the assistant response with the opening XML tag when the expected tool is known. For example, after asking the model to read a file, prefill with <read_file> so it only needs to fill in the parameters. Some providers support this via assistant message prefix.

9. Smart Retry with Simplification

When a small model fails (malformed XML, wrong tool), instead of sending the same error back:

  • Simplify the available tools (remove irrelevant ones for the retry)
  • Rephrase the instruction more directly
  • Provide the specific XML template to fill in
  • After N failures, offer to hand off to a frontier model

10. Task-Specific Micro-Agents

Pre-built prompt + tool combos for common tasks small models handle well:

  • Read and explain — read_file only, no tool output needed
  • Find and replace — search_file_contents + string_replace
  • Run and fix — execute_bash + read_file + string_replace

User selects the micro-agent, or Nanocoder picks based on intent.

11. Auto-Detection

Automatically enable small model optimisations based on model name. If the model name contains size indicators like 1b, 3b, 7b, 8b, enable the mode without requiring manual configuration. Users can override this.


Configuration

{
  "smallModelMode": {
    "enabled": true,
    "slimPrompt": true,
    "toolProfile": "code-edit",
    "maxToolsPerTurn": 1,
    "aggressiveCompact": true,
    "simplifiedSchemas": true,
    "plannerModel": "openrouter/claude-sonnet-4-5"
  }
}

Or at the provider/model level:

{
  "providers": [
    {
      "name": "ollama",
      "baseUrl": "http://localhost:11434/v1",
      "models": ["llama3.2:3b"],
      "smallModelMode": {
        "enabled": true,
        "toolProfile": "minimal"
      }
    }
  ]
}

Implementation Priority

Phase 1 — High impact, lower effort

  • Slim system prompt
  • Static tool profiles (tool subsetting)
  • Single-tool mode
  • Aggressive auto-compact defaults for small models

Phase 2 — High impact, higher effort

  • Simplified tool schemas
  • Smart retry with simplification
  • Auto-detection by model name/size

Phase 3 — Architectural

  • Frontier-as-planner hybrid mode
  • Guided workflows with task classification
  • Task-specific micro-agents
  • Prefill / constrained output

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions