AgentFit

Test whether your AGENTS.md and coding-agent instructions actually work.

Agent instruction files rot. Setup commands change, docs move, nested packages get missed, and teams guess whether a prompt change helped. AgentFit turns that guess into a local-first score, report, and CI check.

AgentFit discovers AGENTS.md, CLAUDE.md, Cursor rules, Copilot instructions, and other agent harness files. It checks whether instructions are discoverable, commands still work, references resolve, nested packages are covered, and generated repo-specific tasks can be verified in isolated git worktrees.

npx @kingkyylian/agentfit@latest eval --adapter dry-run

Why This Exists

AgentFit has been tested against 20 public repositories that already publish coding-agent instructions. The first validation pass found one stale-command issue that became a merged upstream RedisInsight PR, and it exposed AgentFit false positives that were fixed through 0.1.10.

The current feedback ask is narrow: suggest public repos with AGENTS.md, CLAUDE.md, Cursor rules, Copilot instructions, or similar guidance so AgentFit can run deterministic dry-run validation.

Suggest a repo: #9

AgentFit score: 93/100 (A)
AgentFit score 93/100 (A).
Instruction files: 1
Reference issues: 0
Tasks: 5
Task execution: static dry-run preview; generated tasks were not executed.
Runs: 0 executed, 5 previewed

Execute generated tasks in isolated worktrees when you want command-level proof:

npx @kingkyylian/agentfit@latest eval --adapter dry-run --run-tasks

AgentFit's own repository currently scores 100/100 (A) with 5 of 5 generated task runs executed.

What It Catches

missing referenced docs such as @docs/setup.md
stale commands such as pnpm lint after the script was removed
missing verification commands before agents claim work is done
monorepo packages with no nested AGENTS.md
instruction changes that look better but lower the score

60-Second Demo

The included demo starts with a stale AGENTS.md, then compares it with a fixed version:

npx @kingkyylian/agentfit@latest compare examples/reports/demo-before.json examples/reports/demo-after.json --format markdown

AgentFit improved by 28 points: 65/100 (D) -> 93/100 (A).
Fixed checks:
- No nested instruction file found for packages/api.
- Documented command references missing package script "lint".
- No runnable verification command found in instruction files.
- 1 instruction reference is missing or invalid.

See docs/demo.md.

What You Get

deterministic instruction discovery
command and reference checks
generated repo-specific fitness tasks
JSON and Markdown reports
detected safety and reproducibility signal evidence
before/after report comparison
SVG badge output
GitHub Action support for PRs
optional real-agent adapters, starting with Codex

Why Now

Agent-aware repositories are becoming normal. The missing piece is regression testing: once AGENTS.md, CLAUDE.md, or Cursor rules are part of the development workflow, they need the same feedback loop as code. AgentFit gives maintainers a quick answer before and after an instruction change.

Real-World Validation

On 2026-05-11, AgentFit ran 20 dry-run snapshots against public repositories that already publish coding-agent instructions. Dry-run mode did not call model providers or execute generated tasks.

The clearest finding was in RedisInsight: Cursor rules documented stale root E2E scripts. The maintainers requested a PR and merged the fix:

Issue: redis/RedisInsight#5887
PR: redis/RedisInsight#5889

The same validation pass also found AgentFit false positives, including package-local command checks that now resolve nested package scripts in 0.1.10. No endorsement is implied by any repository being tested.

Suggest a public repository for dry-run validation: #9

AgentFit Compared

Tool Type	Checks Syntax	Runs Repo Tasks	Measures Agent Results	Local-First
Heuristic linters	Yes	No	No	Usually
Observability tools	No	Sometimes	Yes	Usually no
AgentFit	Yes	Yes	Yes	Yes

Scoring

Scores are out of 100:

20 instruction discoverability
15 command freshness
15 reference integrity
20 evaluation pass rate
10 diff discipline
10 safety guardrails
10 reproducibility

See docs/scoring.md.

By default, dry-run mode performs deterministic discovery, reference, command, and task-generation checks. Use --run-tasks or a real adapter when you want generated tasks executed in isolated worktrees.

GitHub Action

- uses: kingkyylian/agentfit@v1
  with:
    version: 0.1.10
    adapter: dry-run
    run-tasks: true
    fail-below-score: 70
    task-count: 5
    timeout-seconds: 900
    budget-usd: 1
    format: markdown

See docs/github-action.md.

AgentFit uses this Action on its own repository with run-tasks: true and a minimum score of 90.

For a complete workflow that updates a pull request comment with the AgentFit report, see docs/pr-comment-workflow.md.

Real-World Examples

Dry-run snapshots from public repositories:

Repository	Score	Signal
`hexlet-codebattle/codebattle`	80/100 (B)	stale documented scripts and a nested scope gap
`Brendonovich/MacroGraph`	73/100 (C)	broad monorepo scope coverage gaps
`skybrush-io/skybrush-server`	93/100 (A)	healthy single instruction file

See docs/real-world.md.

License

MIT

Contributing

Keep changes local-first, deterministic by default, and transparent in reports. Real-agent adapters should be optional and must report skipped runs clearly when unavailable.

See CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
agentfit.config.yml		agentfit.config.yml
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentFit

Why This Exists

What It Catches

60-Second Demo

What You Get

Why Now

Real-World Validation

AgentFit Compared

Scoring

GitHub Action

Real-World Examples

License

Contributing

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentFit

Why This Exists

What It Catches

60-Second Demo

What You Get

Why Now

Real-World Validation

AgentFit Compared

Scoring

GitHub Action

Real-World Examples

License

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages