[Guide] Haiku swarm codebase audit — process for surfacing improvement opportunities

## What this is

This is a process guide, not a feature request. It documents the audit methodology we ran on the img-forge repo using parallel Claude Haiku subagents, and proposes running the same process on llm-providers within its own domain.

The img-forge run produced 15 organized issues covering provider expansion, image editing, video generation, and infrastructure improvements — plus surfaced a cross-repo boundary issue (the `ImageProvider` overlap with img-forge, now tracked as #59). That cross-repo finding only emerged at the synthesis step, after agents had independently read both codebases. That's the part that's hard to get from a single-developer review pass.

---

## The process

### Step 1 — Scope your investigation angles

Don't use one big agent with a long prompt. Spawn 4-6 parallel agents, each scoped to a **distinct, bounded slice** of the codebase. They should not overlap. Each agent reads its slice and reports structured findings.

For img-forge, the slices were:
- Gateway API structure and endpoints
- Orchestrator / Durable Object state machine
- MCP tools and model provider integrations
- Image editing and video generation gaps
- Repo metadata (existing issues, deps, wrangler configs, API keys)

The metadata agent is critical and easy to forget — it prevents duplicate issues and tells you what's already wired up.

### Step 2 — Write agent prompts that produce structured output

Each prompt should:
- State exactly what to read (specific directories or files, not "look around")
- Ask for findings in a consistent format (file path + line number for every claim)
- End with a clear deliverable: "This audit will become GitHub issues — be specific"

Vague prompts produce vague findings. Prescribe the output shape.

### Step 3 — Synthesize before filing anything

Don't file issues as each agent returns. Wait for all agents to complete, then read across all reports looking for:
- **Cross-cutting findings** that no single agent saw (e.g., two agents both touching the same abstraction from different sides)
- **Dependency order** — which issues are prerequisites for which
- **Boundary questions** — anything that implicates another repo in the org

The img-forge synthesis step revealed that `ImageProvider` in this repo was a frozen copy of img-forge internals — a finding that required reading both codebases, and produced issue #59 here and issue #64 on img-forge.

### Step 4 — File issues grouped by category with explicit dependency callouts

Group issues by theme (provider coverage, infrastructure, DX, etc.). In each issue body:
- Reference specific file paths and line numbers from agent findings
- Call out prerequisite issues explicitly ("Depends on: #X")
- Write acceptance criteria as checkboxes, not prose

Don't write issues that say "we should improve X." Write issues that say "change `file.ts:line` from A to B because C."

---

## Suggested investigation angles for llm-providers

These are the natural slices given this repo's domain:

### 1. Provider completeness and parity audit
- Which providers are implemented? Which are stubs or partial?
- For each provider: what features are missing (streaming, tool use, JSON mode, vision, caching, seed)?
- Which models in the catalog are outdated or deprecated by the provider?
- What new providers are missing entirely (Mistral, Cohere, DeepSeek, xAI Grok, Amazon Bedrock)?

### 2. Circuit breaker and reliability behavior
- How does the circuit breaker open/close? What are the thresholds?
- What happens when all providers in the fallback chain are open simultaneously?
- Are there edge cases where a failed request is retried against the same provider?
- Does the circuit breaker state persist across Worker invocations (important for CF Workers)?

### 3. Cost tracking and CreditLedger accuracy
- How is token usage counted? Is it pre-call estimation or post-call from response headers?
- Which providers return accurate token counts vs. estimates?
- Does streaming mode correctly accumulate token counts?
- Are there requests that bypass cost tracking (tool call results, cached responses)?

### 4. TypeScript API surface and ergonomics
- Is the public API surface minimal and well-typed, or does it leak internal types?
- Are discriminated union types used consistently for provider-specific params?
- Is `LLMProviders.fromEnv()` robust across CF Workers, Node.js, and Deno?
- Any `unknown` or `any` escape hatches that could be tightened?

### 5. Vision / multimodal input handling
- Which providers support `LLMImageInput` in messages?
- What image formats and sizes are accepted per provider?
- Is there input validation / size limiting before sending to providers?
- How does vision fall back if the selected provider doesn't support it?

### 6. Repo health and packaging
- Are there test gaps (untested providers, untested circuit breaker states)?
- Is the published `dist/` clean (no accidental test fixtures or type leaks)?
- Are peer dependencies and engine constraints documented?
- Does the changelog accurately reflect breaking changes?

---

## Running the audit

Spawn all 6 agents in a single message so they run in parallel. Each agent should report:
- Findings list with file paths and line numbers
- A "gaps" section: what's missing or broken
- A "opportunities" section: what could be improved without being broken

Collect all 6 reports, read them together, then file issues. Budget 30-60 minutes total including issue writing.

---

## Cross-repo concerns to watch for

Based on the img-forge ↔ llm-providers relationship, the following cross-repo concerns are worth checking during synthesis:

- **Any code in llm-providers that duplicates img-forge logic** — `ImageProvider` was one; audit for others
- **Env/config assumptions** — does `LLMProviders.fromEnv()` assume env var names that conflict with img-forge's wrangler bindings?
- **Versioning coupling** — img-forge will import `@stackbilt/llm-providers` at a pinned version; any breaking changes in llm-providers now have a downstream impact on img-forge

---

## Output from the img-forge run (for reference)

The img-forge audit produced:
- 15 feature/improvement issues (#49–#63) across 4 categories
- 1 cross-repo boundary issue (#59 here, #64 on img-forge)
- Clear prerequisite chains (e.g., video infrastructure → video provider integrations)

The full issue set is at: https://github.com/Stackbilt-dev/img-forge/issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Guide] Haiku swarm codebase audit — process for surfacing improvement opportunities #60

What this is

The process

Step 1 — Scope your investigation angles

Step 2 — Write agent prompts that produce structured output

Step 3 — Synthesize before filing anything

Step 4 — File issues grouped by category with explicit dependency callouts

Suggested investigation angles for llm-providers

1. Provider completeness and parity audit

2. Circuit breaker and reliability behavior

3. Cost tracking and CreditLedger accuracy

4. TypeScript API surface and ergonomics

5. Vision / multimodal input handling

6. Repo health and packaging

Running the audit

Cross-repo concerns to watch for

Output from the img-forge run (for reference)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Guide] Haiku swarm codebase audit — process for surfacing improvement opportunities #60

Description

What this is

The process

Step 1 — Scope your investigation angles

Step 2 — Write agent prompts that produce structured output

Step 3 — Synthesize before filing anything

Step 4 — File issues grouped by category with explicit dependency callouts

Suggested investigation angles for llm-providers

1. Provider completeness and parity audit

2. Circuit breaker and reliability behavior

3. Cost tracking and CreditLedger accuracy

4. TypeScript API surface and ergonomics

5. Vision / multimodal input handling

6. Repo health and packaging

Running the audit

Cross-repo concerns to watch for

Output from the img-forge run (for reference)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions