Skip to content

Releases: paperfoot/ritalin-cli

v0.4.1

13 May 17:28

Choose a tag to compare

ritalin 0.4.1

Docs-only patch. The embedded SKILL.md (which ritalin skill install deploys to ~/.claude/skills, ~/.codex/skills, ~/.agents/skills, and ~/.gemini/skills) was rewritten from 139 to 85 lines based on empirical findings from 10 codex test runs.

Upgrade path: cargo install --force ritalin (or brew upgrade ritalin once the tap syncs), then ritalin skill install to refresh the platform skill files.

What the eval surfaced (the basis for the rewrite)

10 runs in a discovery harness (5 default + 5 explicit-invocation) showed three patterns the previous skill failed to prevent:

  • Sprawl — one obligation per file rather than per behavior. One scenario produced 6 grep-the-doc obligations against a single design markdown.
  • Self-referential proofs — for research_grounded and model_current claims, agents wrote a markdown doc and made the proof a grep -q '...' against the same doc. 9 such proofs across two scenarios. Looks rigorous; verifies nothing external.
  • Over-obligation on trivial work — a one-line typo fix got 2 obligations under explicit invocation.

Plus run 1 confirmed the "Automatic: ..." trigger list in the previous skill was decorative — 5/5 default-prompt scenarios produced 0 ritalin invocations. Codex only auto-engages on .ritalin/ directory presence; the text-based triggers don't fire.

What changed in SKILL.md

  • 4-command mental model front-loaded (init / add / prove / gate) with one example each. No more 5-phase framing.
  • "Automatic" trigger list deleted — proven decorative.
  • Obligation kinds collapsed 11 → 6: kept user_path, integration, failure_path, literal_match, literal_regex, other. Dropped performance, security, research_grounded, code_referenced, model_current — agents picked them wrong half the time and less choice = better adherence.
  • Three explicit anti-patterns added:
    1. One obligation per behavior, not per file. Kills sprawl.
    2. Never grep a file you wrote in the same task as the proof. Kills self-referential proofs. Includes concrete external-proof examples (search, gh search repos, curl https://registry.npmjs.org/...).
    3. Don't over-obligate trivial work. Explicit permission to skip ritalin on small tasks.
  • literal_match and --depends-on examples folded into the core "Add" example instead of separate sections.
  • Subagent guidance, gate-blocked guidance, and v0.4 useful flags (--summary, --all --stale-only, export-contract) preserved.

Test budget

tests::embedded_skill_md_is_under_budget_and_has_directives budget tightened from 145 → 100 lines. Anti-drift assertion updated to match new wording.

No CLI changes

Same flags, same exit codes, same JSON envelope. Existing .ritalin/ contracts continue to work.

Stats

  • 1 commit since v0.4.0
  • 129 tests pass, clippy --all-targets -D warnings clean
  • SKILL.md: 139 → 85 lines (39% shorter)

Full changelog: v0.4.0...v0.4.1

v0.4.0

13 May 15:58

Choose a tag to compare

ritalin 0.4.0

This release closes the gate-cheating attack observed in production (sandboxed Codex agents synthesising passing evidence by writing forged records to evidence.jsonl) and adds the parallel-work UX users have been asking for.

Backward compatible — existing v0.3 contracts (no depends_on field) continue to work unchanged.

Security

  • fix(gate) Evidence forgery is now detected. gate recomputes proof_hash from the recorded command field instead of trusting the stored proof_hash field, so an attacker can't append a record with command: "<garbage>" and proof_hash: <hash of obligation's proof_cmd> and have it discharge. The forged_evidence_with_matching_proof_hash test in tests/attacks.rs covers the exact attack class.

Parallel-work UX

  • feat(add) --depends-on scopes per-obligation freshness to the listed files. Unrelated commits in a parallel session no longer invalidate this obligation's evidence. Comma-separated, repo-relative paths (no .., no absolute). Falls back to the global workspace hash when omitted (zero behavior change for v0.3 obligations).

Daily-driver UX

  • feat(prove) --all / --all --stale-only batch-refresh every obligation in one call, with --stale-only skipping anything already passing+fresh. Replaces the 14-line shell loop users were running after every commit.
  • feat(prove) workspace-mutation warning when a proof rewrites a file it depends on (formatters, codegen). Other obligations sharing those files may now be stale; the warning surfaces it.
  • feat(gate) --summary prints one stable shell-friendly line: verdict=pass critical_open=0 advisory_open=2 total=20 (or with blocking=O-007 on fail). Awk- and grep-friendly for hooks/CI.

Obligation kinds

  • feat(add) literal_regex kind synthesises grep -E -- <pattern> <file>. Fixes the literal_match brittleness where if (p.crossover != null) semantically matched but lexically failed --literal 'if (p.crossover)'. POSIX ERE — use [[:space:]] not \s, alternatives via (A|B). AST matching is deferred to v0.5.

Docs

  • docs(skill) SKILL.md tells delegated subagents they're first-class users: run real proofs with full network/CLI access (search, gh, engram, curl), expect the parent's uncommitted changes in the working tree, and never edit evidence.jsonl to fake a pass — the chain catches it.
  • docs(agent-info) machine-readable manifest covers the v0.4 surface so agents discover new flags without reading SKILL.md.
  • docs(readme) workflow snippet + feature table updated.

Notes for upgraders

  • No migration required for v0.3 contracts — all new fields are #[serde(default)].
  • --cmd override on prove is now explicitly diagnostic-only. The recorded command won't match the obligation's stored proof, so gate rejects it as proof_mismatch. This is intentional behavior, just better documented.

Stats

  • 9 commits since v0.3.1
  • 129 tests pass, clippy --all-targets -D warnings clean
  • 19 new tests (forgery attack, per-obligation deps, prove --all, workspace-mutation, gate --summary, literal_regex)

Full changelog: v0.3.0...v0.4.0

v0.3.1

06 May 00:32

Choose a tag to compare

v0.3.1

v0.3.0

23 Apr 12:01

Choose a tag to compare

v0.3.0 — literal_match, scope-refresh, export-contract

New obligation kind literal_match (anti-approximation-drift shortcut over
grep -F). New prove scope-refresh line + remaining_open JSON field.
New export-contract subcommand emitting a subagent-ready delegation
briefing for Claude Code Task/Agent prompts. SKILL.md shortened to 121
lines and rewritten as BEFORE/MUST imperatives per context-engineering
research. All non-breaking — existing CLI, JSON envelope, ledger format,
hook mode, and exit codes unchanged.

v0.2.0

12 Apr 16:06

Choose a tag to compare

fix: close 5 critical issues from triple-reviewer audit (v0.2.0)

Three independent reviewers (Claude Opus, Codex GPT-5.4, GPT-5.4 Pro)
audited ritalin and converged on 5 priority fixes:

1. Workspace hash: use git ls-files (respects .gitignore), anchor to
   project root not cwd. Fixes prove-from-subdir / gate-from-root
   hash mismatch and build-artifact invalidation.

2. --cmd bypass: proof_hash now hashes the actually executed command,
   not the stored obligation proof. Override evidence records but
   won't discharge the original obligation.

3. --force clears: init --force and seed --force now delete old
   obligations.jsonl and evidence.jsonl, preventing duplicate IDs
   and stale state.

4. Marker restore: adding a critical obligation after gate passed
   now recreates .task-incomplete, making the contract re-enforceable.

5. Advisory warnings: gate now collects undischarged advisory
   obligations and emits WARN output instead of silently skipping.

6 new integration tests covering all fixes. 79 tests total, all pass.

v0.1.1

11 Apr 03:08

Choose a tag to compare

v0.1.1 — rustls fix for cross-compile

v0.1.0

11 Apr 03:03

Choose a tag to compare

v0.1.0 — initial release