Releases · paperfoot/ritalin-cli

13 May 17:28

v0.4.1

5d5f226

v0.4.1 Latest

Latest

ritalin 0.4.1

Docs-only patch. The embedded SKILL.md (which ritalin skill install deploys to ~/.claude/skills, ~/.codex/skills, ~/.agents/skills, and ~/.gemini/skills) was rewritten from 139 to 85 lines based on empirical findings from 10 codex test runs.

Upgrade path: cargo install --force ritalin (or brew upgrade ritalin once the tap syncs), then ritalin skill install to refresh the platform skill files.

What the eval surfaced (the basis for the rewrite)

10 runs in a discovery harness (5 default + 5 explicit-invocation) showed three patterns the previous skill failed to prevent:

Sprawl — one obligation per file rather than per behavior. One scenario produced 6 grep-the-doc obligations against a single design markdown.
Self-referential proofs — for research_grounded and model_current claims, agents wrote a markdown doc and made the proof a grep -q '...' against the same doc. 9 such proofs across two scenarios. Looks rigorous; verifies nothing external.
Over-obligation on trivial work — a one-line typo fix got 2 obligations under explicit invocation.

Plus run 1 confirmed the "Automatic: ..." trigger list in the previous skill was decorative — 5/5 default-prompt scenarios produced 0 ritalin invocations. Codex only auto-engages on .ritalin/ directory presence; the text-based triggers don't fire.

What changed in SKILL.md

4-command mental model front-loaded (init / add / prove / gate) with one example each. No more 5-phase framing.
"Automatic" trigger list deleted — proven decorative.
Obligation kinds collapsed 11 → 6: kept user_path, integration, failure_path, literal_match, literal_regex, other. Dropped performance, security, research_grounded, code_referenced, model_current — agents picked them wrong half the time and less choice = better adherence.
Three explicit anti-patterns added:
1. One obligation per behavior, not per file. Kills sprawl.
2. Never grep a file you wrote in the same task as the proof. Kills self-referential proofs. Includes concrete external-proof examples (search, gh search repos, curl https://registry.npmjs.org/...).
3. Don't over-obligate trivial work. Explicit permission to skip ritalin on small tasks.
literal_match and --depends-on examples folded into the core "Add" example instead of separate sections.
Subagent guidance, gate-blocked guidance, and v0.4 useful flags (--summary, --all --stale-only, export-contract) preserved.

Test budget

tests::embedded_skill_md_is_under_budget_and_has_directives budget tightened from 145 → 100 lines. Anti-drift assertion updated to match new wording.

No CLI changes

Same flags, same exit codes, same JSON envelope. Existing .ritalin/ contracts continue to work.

Stats

1 commit since v0.4.0
129 tests pass, clippy --all-targets -D warnings clean
SKILL.md: 139 → 85 lines (39% shorter)

Full changelog: v0.4.0...v0.4.1

Assets 10

13 May 15:58

github-actions

v0.4.0

07f2f7e

v0.4.0

ritalin 0.4.0

This release closes the gate-cheating attack observed in production (sandboxed Codex agents synthesising passing evidence by writing forged records to evidence.jsonl) and adds the parallel-work UX users have been asking for.

Backward compatible — existing v0.3 contracts (no depends_on field) continue to work unchanged.

Security

fix(gate) Evidence forgery is now detected. gate recomputes proof_hash from the recorded command field instead of trusting the stored proof_hash field, so an attacker can't append a record with command: "<garbage>" and proof_hash: <hash of obligation's proof_cmd> and have it discharge. The forged_evidence_with_matching_proof_hash test in tests/attacks.rs covers the exact attack class.

Parallel-work UX

feat(add) --depends-on scopes per-obligation freshness to the listed files. Unrelated commits in a parallel session no longer invalidate this obligation's evidence. Comma-separated, repo-relative paths (no .., no absolute). Falls back to the global workspace hash when omitted (zero behavior change for v0.3 obligations).

Daily-driver UX

feat(prove) --all / --all --stale-only batch-refresh every obligation in one call, with --stale-only skipping anything already passing+fresh. Replaces the 14-line shell loop users were running after every commit.
feat(prove) workspace-mutation warning when a proof rewrites a file it depends on (formatters, codegen). Other obligations sharing those files may now be stale; the warning surfaces it.
feat(gate) --summary prints one stable shell-friendly line: verdict=pass critical_open=0 advisory_open=2 total=20 (or with blocking=O-007 on fail). Awk- and grep-friendly for hooks/CI.

Obligation kinds

feat(add) literal_regex kind synthesises grep -E -- <pattern> <file>. Fixes the literal_match brittleness where if (p.crossover != null) semantically matched but lexically failed --literal 'if (p.crossover)'. POSIX ERE — use [[:space:]] not \s, alternatives via (A|B). AST matching is deferred to v0.5.

Docs

docs(skill) SKILL.md tells delegated subagents they're first-class users: run real proofs with full network/CLI access (search, gh, engram, curl), expect the parent's uncommitted changes in the working tree, and never edit evidence.jsonl to fake a pass — the chain catches it.
docs(agent-info) machine-readable manifest covers the v0.4 surface so agents discover new flags without reading SKILL.md.
docs(readme) workflow snippet + feature table updated.

Notes for upgraders

No migration required for v0.3 contracts — all new fields are #[serde(default)].
--cmd override on prove is now explicitly diagnostic-only. The recorded command won't match the obligation's stored proof, so gate rejects it as proof_mismatch. This is intentional behavior, just better documented.

Stats

9 commits since v0.3.1
129 tests pass, clippy --all-targets -D warnings clean
19 new tests (forgery attack, per-obligation deps, prove --all, workspace-mutation, gate --summary, literal_regex)

Full changelog: v0.3.0...v0.4.0

Assets 10

06 May 00:32

github-actions

v0.3.1

65c8291

v0.3.1

v0.3.1

Assets 10

23 Apr 12:01

github-actions

v0.3.0

0a13057

v0.3.0

v0.3.0 — literal_match, scope-refresh, export-contract

New obligation kind literal_match (anti-approximation-drift shortcut over
grep -F). New prove scope-refresh line + remaining_open JSON field.
New export-contract subcommand emitting a subagent-ready delegation
briefing for Claude Code Task/Agent prompts. SKILL.md shortened to 121
lines and rewritten as BEFORE/MUST imperatives per context-engineering
research. All non-breaking — existing CLI, JSON envelope, ledger format,
hook mode, and exit codes unchanged.

Assets 10

12 Apr 16:06

github-actions

v0.2.0

ef7ee37

v0.2.0

fix: close 5 critical issues from triple-reviewer audit (v0.2.0)

Three independent reviewers (Claude Opus, Codex GPT-5.4, GPT-5.4 Pro)
audited ritalin and converged on 5 priority fixes:

1. Workspace hash: use git ls-files (respects .gitignore), anchor to
   project root not cwd. Fixes prove-from-subdir / gate-from-root
   hash mismatch and build-artifact invalidation.

2. --cmd bypass: proof_hash now hashes the actually executed command,
   not the stored obligation proof. Override evidence records but
   won't discharge the original obligation.

3. --force clears: init --force and seed --force now delete old
   obligations.jsonl and evidence.jsonl, preventing duplicate IDs
   and stale state.

4. Marker restore: adding a critical obligation after gate passed
   now recreates .task-incomplete, making the contract re-enforceable.

5. Advisory warnings: gate now collects undischarged advisory
   obligations and emits WARN output instead of silently skipping.

6 new integration tests covering all fixes. 79 tests total, all pass.

Assets 10

11 Apr 03:08

github-actions

v0.1.1

923934c

v0.1.1

v0.1.1 — rustls fix for cross-compile

Assets 10

11 Apr 03:03

github-actions

v0.1.0

5051f9b

v0.1.0

v0.1.0 — initial release

Assets 8

Releases: paperfoot/ritalin-cli

v0.4.1

ritalin 0.4.1

What the eval surfaced (the basis for the rewrite)

What changed in SKILL.md

Test budget

No CLI changes

Stats

Uh oh!

v0.4.0

ritalin 0.4.0

Security

Parallel-work UX

Daily-driver UX

Obligation kinds

Docs

Notes for upgraders

Stats

Uh oh!

v0.3.1

Uh oh!

v0.3.0

Uh oh!

v0.2.0

Uh oh!

v0.1.1

Uh oh!

v0.1.0

Uh oh!