Add mobius improve: self-diagnosing improvement loop

## Summary

Mobius can evolve agent *prompts* via `mobius evolve`, but it never asks "what kind of agent am I missing?" or "what capability gap keeps causing failures?" This issue proposes a meta-layer (`mobius improve`) that diagnoses system-level gaps and proposes structural improvements.

## Current State

The existing loop is: **compete → judge → elo → evolve prompts → repeat**

Everything improves *within* the existing agent pool. The system never:
- Identifies missing agent types or specializations
- Detects capability gaps from repeated failures
- Retires stale/underperforming agents
- Tracks *why* the system changed over time

## Proposed: `mobius improve`

Three phases:

### Phase 1: Diagnose (`analyst.py` + `DiagnosisReport`)
Analyze recent match history for patterns:
- **Repeated failures** — tasks where all agents score low → capability gap
- **Narrow wins** — barely-edged-out tasks → weak coverage
- **Missing specializations** — cluster tasks by topic, find uncovered areas
- **Stale agents** — high match count, declining win rate

### Phase 2: Propose (`ImprovementProposal` model)
Generate typed proposals:
- `create_agent` — new agent for uncovered specialization
- `retire_agent` — remove underperformers
- `split_agent` — break generic agent into specialists
- `add_capability` — equip agents with new tools (e.g., WebSearch)
- `system_change` — structural improvements

### Phase 3: Act (three modes)
| Mode | Behavior |
|---|---|
| `--dry-run` | Print proposals only |
| `--suggest` | Create tracked proposals, human approves |
| `--auto` | Execute proposals directly |

## Agent Factory Pattern

Instead of hardcoding improvement logic, make it a competition:
```
mobius improve "We keep failing at web research tasks"
  → Spawns "architect" agents
  → Each proposes a different solution
  → Judge picks best proposal
  → Winner's proposal gets executed
```

Architect agents themselves evolve — they get better at proposing improvements because their proposals are judged on subsequent match outcomes.

## New DB Table: `proposals`

```sql
proposals (
  id TEXT PRIMARY KEY,
  type TEXT,           -- create_agent, retire, split, system_change
  description TEXT,
  proposed_by TEXT,    -- agent_id or 'system'
  status TEXT,         -- pending, accepted, rejected, implemented
  evidence TEXT,       -- match IDs that motivated this
  outcome TEXT,        -- what happened after implementation
  created_at, resolved_at
)
```

## Implementation Plan

1. `src/mobius/analyst.py` — match history analysis, `DiagnosisReport` model
2. `ImprovementProposal` pydantic model in `models.py`
3. `proposals` table in `db.py`
4. `mobius improve` CLI command
5. `/mobius-improve` skill (free Opus-powered version)

## Related Ideas

- Agents that can create other agents and equip them with custom skills (stored in DB, not repo)
- Benchmark self-review that feeds back into the improvement loop
- Meta-learning: track which improvement strategies actually helped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mobius improve: self-diagnosing improvement loop #5

Summary

Current State

Proposed: `mobius improve`

Phase 1: Diagnose (`analyst.py` + `DiagnosisReport`)

Phase 2: Propose (`ImprovementProposal` model)

Phase 3: Act (three modes)

Agent Factory Pattern

New DB Table: `proposals`

Implementation Plan

Related Ideas

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mode	Behavior
`--dry-run`	Print proposals only
`--suggest`	Create tracked proposals, human approves
`--auto`	Execute proposals directly

Add mobius improve: self-diagnosing improvement loop #5

Description

Summary

Current State

Proposed: mobius improve

Phase 1: Diagnose (analyst.py + DiagnosisReport)

Phase 2: Propose (ImprovementProposal model)

Phase 3: Act (three modes)

Agent Factory Pattern

New DB Table: proposals

Implementation Plan

Related Ideas

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposed: `mobius improve`

Phase 1: Diagnose (`analyst.py` + `DiagnosisReport`)

Phase 2: Propose (`ImprovementProposal` model)

New DB Table: `proposals`