Releases: paperfoot/ritalin-cli
v0.4.1
ritalin 0.4.1
Docs-only patch. The embedded SKILL.md (which ritalin skill install deploys to ~/.claude/skills, ~/.codex/skills, ~/.agents/skills, and ~/.gemini/skills) was rewritten from 139 to 85 lines based on empirical findings from 10 codex test runs.
Upgrade path: cargo install --force ritalin (or brew upgrade ritalin once the tap syncs), then ritalin skill install to refresh the platform skill files.
What the eval surfaced (the basis for the rewrite)
10 runs in a discovery harness (5 default + 5 explicit-invocation) showed three patterns the previous skill failed to prevent:
- Sprawl — one obligation per file rather than per behavior. One scenario produced 6 grep-the-doc obligations against a single design markdown.
- Self-referential proofs — for
research_groundedandmodel_currentclaims, agents wrote a markdown doc and made the proof agrep -q '...'against the same doc. 9 such proofs across two scenarios. Looks rigorous; verifies nothing external. - Over-obligation on trivial work — a one-line typo fix got 2 obligations under explicit invocation.
Plus run 1 confirmed the "Automatic: ..." trigger list in the previous skill was decorative — 5/5 default-prompt scenarios produced 0 ritalin invocations. Codex only auto-engages on .ritalin/ directory presence; the text-based triggers don't fire.
What changed in SKILL.md
- 4-command mental model front-loaded (init / add / prove / gate) with one example each. No more 5-phase framing.
- "Automatic" trigger list deleted — proven decorative.
- Obligation kinds collapsed 11 → 6: kept
user_path,integration,failure_path,literal_match,literal_regex,other. Droppedperformance,security,research_grounded,code_referenced,model_current— agents picked them wrong half the time and less choice = better adherence. - Three explicit anti-patterns added:
- One obligation per behavior, not per file. Kills sprawl.
- Never grep a file you wrote in the same task as the proof. Kills self-referential proofs. Includes concrete external-proof examples (
search,gh search repos,curl https://registry.npmjs.org/...). - Don't over-obligate trivial work. Explicit permission to skip ritalin on small tasks.
literal_matchand--depends-onexamples folded into the core "Add" example instead of separate sections.- Subagent guidance, gate-blocked guidance, and v0.4 useful flags (
--summary,--all --stale-only,export-contract) preserved.
Test budget
tests::embedded_skill_md_is_under_budget_and_has_directives budget tightened from 145 → 100 lines. Anti-drift assertion updated to match new wording.
No CLI changes
Same flags, same exit codes, same JSON envelope. Existing .ritalin/ contracts continue to work.
Stats
- 1 commit since v0.4.0
- 129 tests pass, clippy
--all-targets -D warningsclean - SKILL.md: 139 → 85 lines (39% shorter)
Full changelog: v0.4.0...v0.4.1
v0.4.0
ritalin 0.4.0
This release closes the gate-cheating attack observed in production (sandboxed Codex agents synthesising passing evidence by writing forged records to evidence.jsonl) and adds the parallel-work UX users have been asking for.
Backward compatible — existing v0.3 contracts (no depends_on field) continue to work unchanged.
Security
fix(gate)Evidence forgery is now detected.gaterecomputesproof_hashfrom the recordedcommandfield instead of trusting the storedproof_hashfield, so an attacker can't append a record withcommand: "<garbage>"andproof_hash: <hash of obligation's proof_cmd>and have it discharge. The forged_evidence_with_matching_proof_hash test intests/attacks.rscovers the exact attack class.
Parallel-work UX
feat(add)--depends-onscopes per-obligation freshness to the listed files. Unrelated commits in a parallel session no longer invalidate this obligation's evidence. Comma-separated, repo-relative paths (no.., no absolute). Falls back to the global workspace hash when omitted (zero behavior change for v0.3 obligations).
Daily-driver UX
feat(prove)--all/--all --stale-onlybatch-refresh every obligation in one call, with--stale-onlyskipping anything already passing+fresh. Replaces the 14-line shell loop users were running after every commit.feat(prove)workspace-mutation warning when a proof rewrites a file it depends on (formatters, codegen). Other obligations sharing those files may now be stale; the warning surfaces it.feat(gate)--summaryprints one stable shell-friendly line:verdict=pass critical_open=0 advisory_open=2 total=20(or withblocking=O-007on fail). Awk- and grep-friendly for hooks/CI.
Obligation kinds
feat(add)literal_regexkind synthesisesgrep -E -- <pattern> <file>. Fixes theliteral_matchbrittleness whereif (p.crossover != null)semantically matched but lexically failed--literal 'if (p.crossover)'. POSIX ERE — use[[:space:]]not\s, alternatives via(A|B). AST matching is deferred to v0.5.
Docs
docs(skill)SKILL.md tells delegated subagents they're first-class users: run real proofs with full network/CLI access (search,gh,engram,curl), expect the parent's uncommitted changes in the working tree, and never editevidence.jsonlto fake a pass — the chain catches it.docs(agent-info)machine-readable manifest covers the v0.4 surface so agents discover new flags without reading SKILL.md.docs(readme)workflow snippet + feature table updated.
Notes for upgraders
- No migration required for v0.3 contracts — all new fields are
#[serde(default)]. --cmdoverride onproveis now explicitly diagnostic-only. The recorded command won't match the obligation's stored proof, so gate rejects it asproof_mismatch. This is intentional behavior, just better documented.
Stats
- 9 commits since v0.3.1
- 129 tests pass, clippy
--all-targets -D warningsclean - 19 new tests (forgery attack, per-obligation deps, prove --all, workspace-mutation, gate --summary, literal_regex)
Full changelog: v0.3.0...v0.4.0
v0.3.1
v0.3.1
v0.3.0
v0.3.0 — literal_match, scope-refresh, export-contract New obligation kind literal_match (anti-approximation-drift shortcut over grep -F). New prove scope-refresh line + remaining_open JSON field. New export-contract subcommand emitting a subagent-ready delegation briefing for Claude Code Task/Agent prompts. SKILL.md shortened to 121 lines and rewritten as BEFORE/MUST imperatives per context-engineering research. All non-breaking — existing CLI, JSON envelope, ledger format, hook mode, and exit codes unchanged.
v0.2.0
fix: close 5 critical issues from triple-reviewer audit (v0.2.0) Three independent reviewers (Claude Opus, Codex GPT-5.4, GPT-5.4 Pro) audited ritalin and converged on 5 priority fixes: 1. Workspace hash: use git ls-files (respects .gitignore), anchor to project root not cwd. Fixes prove-from-subdir / gate-from-root hash mismatch and build-artifact invalidation. 2. --cmd bypass: proof_hash now hashes the actually executed command, not the stored obligation proof. Override evidence records but won't discharge the original obligation. 3. --force clears: init --force and seed --force now delete old obligations.jsonl and evidence.jsonl, preventing duplicate IDs and stale state. 4. Marker restore: adding a critical obligation after gate passed now recreates .task-incomplete, making the contract re-enforceable. 5. Advisory warnings: gate now collects undischarged advisory obligations and emits WARN output instead of silently skipping. 6 new integration tests covering all fixes. 79 tests total, all pass.
v0.1.1
v0.1.1 — rustls fix for cross-compile
v0.1.0
v0.1.0 — initial release