Research Proof

Pressure-test research claims with falsifiable evidence plans, adversarial checks, frozen verifiers, evidence certainty checks, and proof ledgers.

The mental model behind Research Proof is simple: treat every promising idea as a claim under test. Borrow the habits that strong research teams use across AI, medicine, mathematics, engineering, and design: freeze the verifier, separate evidence from interpretation, search for counterexamples, charge hidden costs, and only upgrade confidence when the claim survives transfer pressure. References shape the method; they are never treated as proof of a user claim.

Release 1.3.0

This release keeps the skill compact while making the proof path easier to follow.

Highlights:

Makes the selected proof method and next method-specific moves explicit.
Keeps compact and schema-style answers from dropping raw evidence or rejected shortcuts.
Adds practical eval-harness guidance for reusable helpers instead of one-off scripts.
Keeps status language tied to the evidence in the answer.
Updates the Claude Code plugin to 1.3.0.

The references are intentionally plain: proof ladders, causal attribution, live-source review, clinical evidence review, design research, cross-domain transfer, and eval harness structure.

See CHANGELOG.md for release notes.

Install

npx skills add tonyblu331/research-proof --skill research-proof

Global install:

npx skills add tonyblu331/research-proof --skill research-proof -g

List available skills before installing:

npx skills add tonyblu331/research-proof --list

Manual install:

git clone https://github.com/tonyblu331/research-proof.git

Then copy skills/research-proof into your agent's skills directory.

Claude Code Plugin

This repo is a Claude Code marketplace. Install it with:

claude plugin marketplace add tonyblu331/research-proof
claude plugin install research-proof-plugin@research-proof

Invoke it with:

/research-proof-plugin:research-proof

The plugin wrapper lives here:

.claude-plugin/marketplace.json
plugins/research-proof-plugin/
  .claude-plugin/plugin.json
  skills/research-proof/

Local plugin test:

claude plugin marketplace add .\
claude --plugin-dir .\plugins\research-proof-plugin

Optional local checks:

node .\tools\validate-research-skill.mjs
claude plugin validate .
claude plugin validate .\plugins\research-proof-plugin

Use It For

Use Research Proof when a claim is promising but still vague:

Use research-proof to pressure-test this claim: our agent loop can improve a prompt library overnight without human review.

Good fits:

research roadmaps
benchmark reviews
proof ladders
cross-domain mathematical transfer
evaluator-gated loops
research TDD scenarios
clinical or intervention evidence questions
systematic reviews and evidence-certainty checks
causal inference and observational-data claims
mathematical innovation by borrowing invariants, constructions, or proof tools from distant fields
SIGGRAPH-style artifact, rendering, simulation, and perceptual-system claims
tool-grounded scientific workflows and live-source research claims
clinical AI reporting, calibration, validation, and deployment-readiness claims
design research and prototype-readiness claims
research-program strategy and funding decisions
adversarial follow-up tests

What It Produces

Research Proof forces the agent to define:

Claim
Verifier Boundary
Baseline / Candidate Family
Current Evidence
Enemy Terms
Rejection Gates
Proof Ladder / Transfer Path
Verdict
Proof Ledger Decision
Next Pressure

Evidence is labeled as PROVEN, SUPPORTED, REJECTED, or OPEN.

Quick Example

Messy claim:

Our autonomous loop can improve a prompt library overnight without human review.

Research Proof rewrite:

Claim
For prompt set D and baseline B, candidate loop C wins only if held-out task score improves by +5% while latency, token cost, regressions, and human review stay within budget.

Verifier Boundary
The evaluator, held-out tasks, scoring rubric, and regression set are frozen before the loop starts. The candidate can edit prompts only. It cannot inspect held-out answers, change tests, widen budgets, or mark its own outputs as accepted.

Rejection Gates
Reject if the candidate changes the evaluator, fails regression, exceeds token budget, improves only visible tasks, or requires manual cleanup.

Proof Ledger Decision
OPEN until it wins the frozen harness and survives transfer.

Next Pressure
Run a transfer test on a new prompt family with the same scoring rules.

See examples/fuzzy-claim-proof-ledger.md for the full worked example.

Distribution

This repository ships the same skill through Claude Code plugins and the open skills CLI. The source of truth is skills/research-proof; the plugin skill directory is a distribution copy kept in sync by the repo checks.

Source: github.com/tonyblu331/research-proof
Releases: github.com/tonyblu331/research-proof/releases

Repository Layout

assets/
examples/
plugins/research-proof-plugin/
skills/research-proof/
  SKILL.md
  evals/evals.json
  references/
tools/
  lib/
  validate-research-skill.mjs
.github/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
assets		assets
evaluation/capability-gates		evaluation/capability-gates
examples		examples
plugins/research-proof-plugin		plugins/research-proof-plugin
skills/research-proof		skills/research-proof
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
skills.sh.json		skills.sh.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Proof

Release 1.3.0

Install

Claude Code Plugin

Use It For

What It Produces

Quick Example

Distribution

Repository Layout

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Research Proof

Release 1.3.0

Install

Claude Code Plugin

Use It For

What It Produces

Quick Example

Distribution

Repository Layout

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages