Skip to content

barryroodt/refine-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@jumptag/refine-skill

npm version image license

Refine an Agent Skill via the skill-forge judge → hitl loop, in a sandboxed Docker container.

Quick start

export ANTHROPIC_API_KEY=sk-...
npx @jumptag/refine-skill ./path/to/skill

Default: 3 iterations max, claude-sonnet-4-5, telemetry written to <path>/.refine/log.json.

Requirements

  • Node 20+ (for npx).
  • Docker (Engine 20.10+, daemon running).
  • An API key for one of: Anthropic, OpenAI, Google, xAI, Mistral, Groq, OpenRouter — matched to your --model choice. See MODELS.md for the full list of supported models, env vars, and where to get keys.

Usage

npx @jumptag/refine-skill <path> [options]
Option Default Effect
--iterations N 3 Max passes before stopping at the cap
--model M claude-sonnet-4-5 Any model pi.dev supports
--image TAG ghcr.io/barryroodt/refine-skill:<pkg-version> Override the image
--pull POLICY missing always / never / missing
--no-log off Skip writing .refine/log.json
--dry-run off Print the docker invocation and exit
--verbose off Stream pi output uncut
--pi-timeout SECS 600 Per-pi-call timeout

How stopping works

The loop exits at the first matching rule:

  1. Pass 1 → never stops.
  2. Judge produces zero items → all_obsolete.
  3. All items match "already satisfied / no-op / superseded" → all_obsolete.
  4. Score change from previous pass < 2 → delta_below_threshold.
  5. All items match trade-off / diminishing / LOW priority → tradeoff_floor.
  6. Pass > --iterationsmax_iterations (exit 1, still successful).

Telemetry

.refine/log.json contains per-pass score/grade/delta, per-item commit messages + diffs, stop reason, model, image tag, timestamps. Disable with --no-log.

Running without npx

docker run --rm -i \
  -v "$PWD/path/to/skill:/work" \
  -e ANTHROPIC_API_KEY \
  ghcr.io/barryroodt/refine-skill:latest \
  /work --iterations 3 --model claude-sonnet-4-5

Exit codes

Code Meaning
0 Natural convergence (any of rules 2-5)
1 Max iterations reached (still successful)
2 Bad path / missing SKILL.md
3 Missing / mismatched API key
4 Docker not available
10 Pi crash
11 Judge output malformed
12 Hitl partial apply
13 Disk full / OOM
14 Another refine running on the same path
130 SIGINT
143 SIGTERM

Spec

specs/2026-05-20-deftly-refine-cli-design.md

Credits

refine-skill is a thin orchestration harness around two existing pieces of work:

  • Skill Forge by @WrathZAskill-forge-judge + skill-forge-hitl provide all the actual refinement logic (scoring rubric, per-item HITL loop). Apache 2.0; pinned tag 2026.04.30; baked into the image at build time and copied verbatim. See NOTICE.
  • pi.dev coding agent by @mariozechner — provider-agnostic LLM harness that runs the two skills inside the container.

This project (@jumptag/refine-skill, MIT) just wires them together: Node CLI + bash outer loop + deterministic stop rules + telemetry.

About

Refine an Agent Skill via the skill-forge judge → hitl loop, in a sandboxed Docker container. npx-installable.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors