Test whether your AGENTS.md and coding-agent instructions actually work.
Agent instruction files rot. Setup commands change, docs move, nested packages get missed, and teams guess whether a prompt change helped. AgentFit turns that guess into a local-first score, report, and CI check.
AgentFit discovers AGENTS.md, CLAUDE.md, Cursor rules, Copilot instructions, and other agent harness files. It checks whether instructions are discoverable, commands still work, references resolve, nested packages are covered, and generated repo-specific tasks can be verified in isolated git worktrees.
npx @kingkyylian/agentfit@latest eval --adapter dry-runAgentFit has been tested against 20 public repositories that already publish coding-agent instructions. The first validation pass found one stale-command issue that became a merged upstream RedisInsight PR, and it exposed AgentFit false positives that were fixed through 0.1.10.
The current feedback ask is narrow: suggest public repos with AGENTS.md, CLAUDE.md, Cursor rules, Copilot instructions, or similar guidance so AgentFit can run deterministic dry-run validation.
Suggest a repo: #9
AgentFit score: 93/100 (A)
AgentFit score 93/100 (A).
Instruction files: 1
Reference issues: 0
Tasks: 5
Task execution: static dry-run preview; generated tasks were not executed.
Runs: 0 executed, 5 previewed
Execute generated tasks in isolated worktrees when you want command-level proof:
npx @kingkyylian/agentfit@latest eval --adapter dry-run --run-tasksAgentFit's own repository currently scores 100/100 (A) with 5 of 5 generated task runs executed.
- missing referenced docs such as
@docs/setup.md - stale commands such as
pnpm lintafter the script was removed - missing verification commands before agents claim work is done
- monorepo packages with no nested
AGENTS.md - instruction changes that look better but lower the score
The included demo starts with a stale AGENTS.md, then compares it with a fixed version:
npx @kingkyylian/agentfit@latest compare examples/reports/demo-before.json examples/reports/demo-after.json --format markdownAgentFit improved by 28 points: 65/100 (D) -> 93/100 (A).
Fixed checks:
- No nested instruction file found for packages/api.
- Documented command references missing package script "lint".
- No runnable verification command found in instruction files.
- 1 instruction reference is missing or invalid.
See docs/demo.md.
- deterministic instruction discovery
- command and reference checks
- generated repo-specific fitness tasks
- JSON and Markdown reports
- detected safety and reproducibility signal evidence
- before/after report comparison
- SVG badge output
- GitHub Action support for PRs
- optional real-agent adapters, starting with Codex
Agent-aware repositories are becoming normal. The missing piece is regression testing: once AGENTS.md, CLAUDE.md, or Cursor rules are part of the development workflow, they need the same feedback loop as code. AgentFit gives maintainers a quick answer before and after an instruction change.
On 2026-05-11, AgentFit ran 20 dry-run snapshots against public repositories that already publish coding-agent instructions. Dry-run mode did not call model providers or execute generated tasks.
The clearest finding was in RedisInsight: Cursor rules documented stale root E2E scripts. The maintainers requested a PR and merged the fix:
- Issue: redis/RedisInsight#5887
- PR: redis/RedisInsight#5889
The same validation pass also found AgentFit false positives, including package-local command checks that now resolve nested package scripts in 0.1.10. No endorsement is implied by any repository being tested.
Suggest a public repository for dry-run validation: #9
| Tool Type | Checks Syntax | Runs Repo Tasks | Measures Agent Results | Local-First |
|---|---|---|---|---|
| Heuristic linters | Yes | No | No | Usually |
| Observability tools | No | Sometimes | Yes | Usually no |
| AgentFit | Yes | Yes | Yes | Yes |
Scores are out of 100:
- 20 instruction discoverability
- 15 command freshness
- 15 reference integrity
- 20 evaluation pass rate
- 10 diff discipline
- 10 safety guardrails
- 10 reproducibility
See docs/scoring.md.
By default, dry-run mode performs deterministic discovery, reference, command, and task-generation checks. Use --run-tasks or a real adapter when you want generated tasks executed in isolated worktrees.
- uses: kingkyylian/agentfit@v1
with:
version: 0.1.10
adapter: dry-run
run-tasks: true
fail-below-score: 70
task-count: 5
timeout-seconds: 900
budget-usd: 1
format: markdownAgentFit uses this Action on its own repository with run-tasks: true and a minimum score of 90.
For a complete workflow that updates a pull request comment with the AgentFit report, see docs/pr-comment-workflow.md.
Dry-run snapshots from public repositories:
| Repository | Score | Signal |
|---|---|---|
hexlet-codebattle/codebattle |
80/100 (B) | stale documented scripts and a nested scope gap |
Brendonovich/MacroGraph |
73/100 (C) | broad monorepo scope coverage gaps |
skybrush-io/skybrush-server |
93/100 (A) | healthy single instruction file |
See docs/real-world.md.
MIT
Keep changes local-first, deterministic by default, and transparent in reports. Real-agent adapters should be optional and must report skipped runs clearly when unavailable.
See CONTRIBUTING.md.