Skip to content

Add mobius improve: self-diagnosing improvement loop #5

@AaronGoldsmith

Description

@AaronGoldsmith

Summary

Mobius can evolve agent prompts via mobius evolve, but it never asks "what kind of agent am I missing?" or "what capability gap keeps causing failures?" This issue proposes a meta-layer (mobius improve) that diagnoses system-level gaps and proposes structural improvements.

Current State

The existing loop is: compete → judge → elo → evolve prompts → repeat

Everything improves within the existing agent pool. The system never:

  • Identifies missing agent types or specializations
  • Detects capability gaps from repeated failures
  • Retires stale/underperforming agents
  • Tracks why the system changed over time

Proposed: mobius improve

Three phases:

Phase 1: Diagnose (analyst.py + DiagnosisReport)

Analyze recent match history for patterns:

  • Repeated failures — tasks where all agents score low → capability gap
  • Narrow wins — barely-edged-out tasks → weak coverage
  • Missing specializations — cluster tasks by topic, find uncovered areas
  • Stale agents — high match count, declining win rate

Phase 2: Propose (ImprovementProposal model)

Generate typed proposals:

  • create_agent — new agent for uncovered specialization
  • retire_agent — remove underperformers
  • split_agent — break generic agent into specialists
  • add_capability — equip agents with new tools (e.g., WebSearch)
  • system_change — structural improvements

Phase 3: Act (three modes)

Mode Behavior
--dry-run Print proposals only
--suggest Create tracked proposals, human approves
--auto Execute proposals directly

Agent Factory Pattern

Instead of hardcoding improvement logic, make it a competition:

mobius improve "We keep failing at web research tasks"
  → Spawns "architect" agents
  → Each proposes a different solution
  → Judge picks best proposal
  → Winner's proposal gets executed

Architect agents themselves evolve — they get better at proposing improvements because their proposals are judged on subsequent match outcomes.

New DB Table: proposals

proposals (
  id TEXT PRIMARY KEY,
  type TEXT,           -- create_agent, retire, split, system_change
  description TEXT,
  proposed_by TEXT,    -- agent_id or 'system'
  status TEXT,         -- pending, accepted, rejected, implemented
  evidence TEXT,       -- match IDs that motivated this
  outcome TEXT,        -- what happened after implementation
  created_at, resolved_at
)

Implementation Plan

  1. src/mobius/analyst.py — match history analysis, DiagnosisReport model
  2. ImprovementProposal pydantic model in models.py
  3. proposals table in db.py
  4. mobius improve CLI command
  5. /mobius-improve skill (free Opus-powered version)

Related Ideas

  • Agents that can create other agents and equip them with custom skills (stored in DB, not repo)
  • Benchmark self-review that feeds back into the improvement loop
  • Meta-learning: track which improvement strategies actually helped

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions