Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,7 @@
"./skills/investigate",
"./skills/critique",
"./skills/synthesize",
"./skills/search-arxiv",
"./skills/search-iacr"
"./skills/search-paper"
]
}
]
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
- name: Install Python dependencies
run: pip install pytest arxiv requests beautifulsoup4
- name: Run integration tests (search scripts against live APIs)
run: pytest tests/test_search_arxiv.py tests/test_search_iacr.py -v
run: pytest tests/test_search_paper.py -v

npx-skills-discovery:
runs-on: ubuntu-latest
Expand All @@ -60,7 +60,7 @@ jobs:
exit 1
fi
ls -la "$target"
expected="reaper clarify-goal analyze-paper review-literature formalize-problem brainstorm investigate critique synthesize search-arxiv search-iacr"
expected="reaper clarify-goal analyze-paper review-literature formalize-problem brainstorm investigate critique synthesize search-paper"
missing=0
for skill in $expected; do
if [ ! -f "$target/$skill/SKILL.md" ]; then
Expand All @@ -69,7 +69,7 @@ jobs:
fi
done
# Co-located Python scripts must travel with their skill dirs
for script in search-arxiv/search_arxiv.py search-iacr/search_iacr.py; do
for script in search-paper/arxiv.py search-paper/iacr.py search-paper/semantic_scholar.py search-paper/dblp.py search-paper/openalex.py; do
if [ ! -f "$target/$script" ]; then
echo "::error::Missing required asset: $target/$script"
missing=$((missing + 1))
Expand Down
6 changes: 3 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ AI-native scientific research pipeline distributed as a host-agnostic skills pac

## Project structure

- `skills/` — 11 composable skills (each has a `SKILL.md` defining its behavior; the `/<skill>` form is the canonical display convention used in all user-facing docs)
- `skills/` — 10 composable skills (each has a `SKILL.md` defining its behavior; the `/<skill>` form is the canonical display convention used in all user-facing docs)
- `/reaper` — Main orchestrator that chains all other skills
- `/clarify-goal` — Interactive goal clarification (asks user targeted questions before pipeline runs)
- `/analyze-paper`, `/review-literature`, `/formalize-problem`, `/brainstorm`, `/investigate`, `/critique`, `/synthesize` — Pipeline stages
- `/search-arxiv`, `/search-iacr` — Academic search via Python scripts
- `/search-paper` — Academic search + citation graph + venue resolution. Bundles five Python drivers (`arxiv.py`, `iacr.py`, `semantic_scholar.py`, `dblp.py`, `openalex.py`); the `SKILL.md` itself orchestrates the layered venue lookup.
- `tests/` — Python tests for skill structure and search scripts
- `evals/` — Test cases with quality criteria (`evals.json`)
- `dev/` — Development docs including `ROADMAP.md` (full methodology and design)
Expand All @@ -32,7 +32,7 @@ pip install arxiv requests beautifulsoup4
- Runtime state goes in `reaper-workspace/` (gitignored). Never commit workspace artifacts.
- The six methodology principles (separation of concerns, fixed evaluation signal, structured results log, keep-or-discard loop, never stop, clarity and simplicity) govern how skills behave.
- Domain-specific content (impossibility results, trust model checklists, venue tiers, definitional standards) lives in `skills/reaper/references/`, not inline in skills. Skills reference these files but remain domain-agnostic — the reference files can be swapped for a different research domain.
- Python scripts live alongside the skill that uses them (e.g., `skills/search-arxiv/search_arxiv.py`).
- Python scripts live alongside the skill that uses them (e.g., `skills/search-paper/arxiv.py`).
- No JavaScript/TypeScript in this project — it's `SKILL.md` files + Python only.
- The license is Apache-2.0. Any plugin manifest that references a license field must say `"Apache-2.0"`.
- When cutting a release tag, the tag message should summarize changes since the last tag (use `git log <last-tag>..HEAD`).
Expand Down
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ How you invoke a skill depends on the host agent. The `/<skill>` form above is t

- **Autonomous multi-stage pipeline** — goal clarification, paper analysis, literature review, hypothesis formalization, parallel investigation, critique, and synthesis all chain automatically
- **Parallel investigation with keep-or-discard discipline** — multiple hypotheses are investigated concurrently; only genuine progress advances the working state, while dead ends stay logged
- **Built-in academic search** — arXiv and IACR ePrint search with PDF download and citation graph tracing
- **Built-in academic search** — paper search, PDF download, citation graph tracing, and venue resolution across arXiv, IACR ePrint, Semantic Scholar, DBLP, and OpenAlex
- **Domain-agnostic design** — ships with cryptography and distributed systems references, but swap the reference files to adapt to any research domain
- **Multi-model AI consultation** — optionally consult Codex, Gemini, DeepSeek, or local models for a second opinion at every pipeline stage
- **Composable skills** — each pipeline stage is an independent skill you can run standalone
Expand Down Expand Up @@ -71,8 +71,7 @@ Each skill can be used independently or composed by the orchestrator. Invoke by
| `/investigate` | Run investigation cycles with keep-or-discard discipline |
| `/critique` | Provide critique via human feedback, Codex consultation, or self-review (can trigger more investigation) |
| `/synthesize` | Generate a structured research report from investigation results |
| `/search-arxiv` | Search arXiv papers, download PDFs, and trace citation graphs |
| `/search-iacr` | Search IACR ePrint archive for cryptography papers |
| `/search-paper` | Find papers, download PDFs, trace citation graphs, and resolve publication venues across arXiv, IACR ePrint, Semantic Scholar, DBLP, and OpenAlex |

> The `/<skill>` form is the canonical display convention used throughout these docs. Slash-command hosts (Claude Code) invoke them directly that way (e.g. `/clarify-goal`). Auto-discovery hosts (Cursor, Codex CLI, Cline, Continue, Gemini CLI, Copilot, Windsurf, …) invoke them by the bare skill name — drop the leading `/` when asking the agent to run a skill.

Expand All @@ -90,7 +89,7 @@ pip install arxiv requests beautifulsoup4

### Install via `npx skills` (recommended — works on 45+ agents)

Reaper is distributed as standard `SKILL.md` folders. The cross-agent installer [`vercel-labs/skills`](https://github.com/vercel-labs/skills) shallow-clones this repository and copies all 11 skill directories — including Python scripts and reference files — into your agent's conventional skills folder.
Reaper is distributed as standard `SKILL.md` folders. The cross-agent installer [`vercel-labs/skills`](https://github.com/vercel-labs/skills) shallow-clones this repository and copies all 10 skill directories — including Python scripts and reference files — into your agent's conventional skills folder.

```bash
# Latest from the default branch
Expand Down Expand Up @@ -204,7 +203,7 @@ See [`dev/ROADMAP.md`](dev/ROADMAP.md) for the full methodology and development
See [`dev/ROADMAP.md`](dev/ROADMAP.md) for the full roadmap.

- **Horizon 1 (The Pipeline)**: Core skills, orchestrator, and eval framework — *complete; LaTeX report output planned*
- **Horizon 2 (The Library)**: arXiv/ePrint search via Python scripts + citation graph — *complete*
- **Horizon 2 (The Library)**: arXiv/ePrint search via Python scripts + citation graph + venue resolution (Semantic Scholar / DBLP / OpenAlex) — *complete*
- **Horizon 3 (The Committee)**: Multi-model critique via the `/critique` skill's `--codex` mode — *Codex complete, Gemini/DeepSeek/local planned*
- **Horizon 3.5 (The Polyglot)**: Cross-agent distribution via `npx skills` and host-agnostic skill prose — *complete; per-host orchestration polish ongoing*
- **Horizon 4 (The Academy)**: Broader topic search (Scholar/DBLP), author-centric and venue-centric search — *planned*
Expand Down
38 changes: 22 additions & 16 deletions dev/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

### Value Proposition

Today, AI can answer questions about papers. Reaper goes further: it *does research*. Given a paper on a new consensus protocol and the goal "determine if this is secure under asynchrony," Reaper will read the paper, search arXiv and IACR ePrint for related work, formalize the problem, attempt a proof or construct a counterexample, seek feedback from other AI models, and produce a structured research report with full reasoning traces.
Today, AI can answer questions about papers. Reaper goes further: it *does research*. Given a paper on a new consensus protocol and the goal "determine if this is secure under asynchrony," Reaper will read the paper, search academic sources (arXiv, IACR ePrint, Semantic Scholar, DBLP, OpenAlex) for related work, formalize the problem, attempt a proof or construct a counterexample, seek feedback from other AI models, and produce a structured research report with full reasoning traces.

The key insight: research is a *pipeline* of distinct, composable activities (read, search, formalize, analyze, verify, synthesize), not a monolithic task. Reaper decomposes this pipeline into individual skills that can be invoked independently or orchestrated together, and leverages parallel subagents and multi-model feedback to approximate the quality of collaborative human research.

Expand Down Expand Up @@ -133,12 +133,13 @@ reaper/
│ ├── investigate/SKILL.md # Stage 3: investigate (proof/analysis cycles)
│ ├── critique/SKILL.md # Stage 3 sub-step: human / external-model / self review
│ ├── synthesize/SKILL.md # Stage 4: synthesize (report generation)
│ ├── search-arxiv/ # Crypto/CS topic search via arXiv
│ │ ├── SKILL.md
│ │ └── search_arxiv.py # arXiv API + Semantic Scholar citations
│ └── search-iacr/ # Crypto-specific IACR ePrint search
│ ├── SKILL.md
│ └── search_iacr.py # IACR ePrint scraper
│ └── search-paper/ # Unified academic search + venue resolution
│ ├── SKILL.md # Orchestrates the layered venue lookup
│ ├── arxiv.py # arXiv API
│ ├── iacr.py # IACR ePrint scraper
│ ├── semantic_scholar.py # Citations + venue lookup (by arXiv ID or title)
│ ├── dblp.py # CS-authoritative venue lookup (by title)
│ └── openalex.py # Broad-coverage venue lookup (by title)
├── tests/ # Python tests
├── dev/
│ ├── ROADMAP.md # This file
Expand Down Expand Up @@ -271,24 +272,29 @@ And each skill works standalone: invoke `analyze-paper paper.pdf` for just a str

**Methodology stage:** Enriches Stage 1b (establish baseline from literature) with real academic paper servers.

**Goal:** Upgrade `/review-literature` from generic web search to structured academic search — arXiv, IACR ePrint, citation graph traversal — using lightweight Python scripts (no MCP dependency). Also enable `/investigate` to pull in new references mid-loop when a cycle reveals a gap in context.
**Goal:** Upgrade `/review-literature` from generic web search to structured academic search — a unified `/search-paper` skill that fans out over arXiv, IACR ePrint, Semantic Scholar, DBLP, and OpenAlex for paper search, citation graph traversal, and publication-venue resolution — using lightweight Python scripts (no MCP dependency). Also enable `/investigate` to pull in new references mid-loop when a cycle reveals a gap in context.

**What success looks like:** invoking the `/review-literature` skill with `"post-quantum threshold signatures"` automatically searches arXiv and IACR ePrint, traces forward/backward citations via Semantic Scholar, and produces a structured literature survey with precise references.
**What success looks like:** invoking the `/review-literature` skill with `"post-quantum threshold signatures"` delegates to `/search-paper` for paper search, citation graph traversal, and layered venue resolution, and produces a structured literature survey with real publication venues (e.g. CRYPTO 2023) in the references.

#### Search Tools

| Script | Location | Capabilities | Dependencies |
|--------|----------|--------------|-------------|
| `search_arxiv.py` | `skills/search-arxiv/` | `search`, `download`, `citations` (via Semantic Scholar) | `pip install arxiv requests` |
| `search_iacr.py` | `skills/search-iacr/` | `search`, `recent`, `download`, `url` | `pip install requests beautifulsoup4` |
| `arxiv.py` | `skills/search-paper/` | `search`, `recent`, `download`, `journal-ref` | `pip install arxiv` |
| `iacr.py` | `skills/search-paper/` | `search`, `recent`, `download`, `url`, `pubinfo` | `pip install requests beautifulsoup4` |
| `semantic_scholar.py` | `skills/search-paper/` | `venue` (by arXiv ID or title), `citations` | `pip install requests` |
| `dblp.py` | `skills/search-paper/` | `venue` (by title + author) | `pip install requests` |
| `openalex.py` | `skills/search-paper/` | `venue` (by title) | `pip install requests` |

The `/search-paper` `SKILL.md` orchestrates a layered venue resolver: Semantic Scholar → archive's own field (arXiv `journal_ref` / ePrint `Publication info`) → DBLP → OpenAlex → `(preprint)` label.

#### Tasks

- [x] Build `/search-arxiv` skill with Python script (arXiv API + Semantic Scholar citations)
- [x] Build `/search-iacr` skill with Python script (IACR ePrint scraper)
- [x] Write `references/search-tools.md` — catalog of search tools with usage patterns and decision tree
- [x] Update `/review-literature` skill: structured search as primary, WebSearch as fallback, citation graph, recent papers
- [x] Build `/search-paper` skill bundling arXiv + IACR + Semantic Scholar + DBLP + OpenAlex drivers
- [x] Write `references/search-tools.md` — catalog of search tools with usage patterns, decision tree, and venue-resolution protocol
- [x] Update `/review-literature` skill: structured search as primary, WebSearch as fallback, citation graph, recent papers, layered venue resolution per kept paper
- [x] Update `/investigate` skill: mid-cycle literature search via search scripts
- [x] Update `/synthesize` skill: References section uses resolved venues, never raw archive IDs
- [x] Handle graceful degradation when search scripts are unavailable
- [x] Document Python prerequisites in README
- [ ] Test: given a seed paper, can Reaper find and summarize the 10 most relevant related works?
Expand Down Expand Up @@ -415,7 +421,7 @@ Different models have different strengths. The critique skill should route consu
| **Cross-agent installer** | `npx skills add SebastianElvis/reaper` | ✓ |
| **Pin syntax** | `npx skills add SebastianElvis/reaper#v0.3.9` (git tags) | ✓ |
| **Inter-skill calls** | Host-agnostic prose ("invoke the `<name>` skill") | ✓ |
| **Python script bundling** | Whole-directory copy includes `search_arxiv.py`, `search_iacr.py`, `references/` | ✓ |
| **Python script bundling** | Whole-directory copy includes `arxiv.py`, `iacr.py`, `semantic_scholar.py`, `dblp.py`, `openalex.py`, `references/` | ✓ |
| **Frontmatter compatibility** | Claude-only keys (`user-invocable`, `argument-hint`, hooks) preserved as opaque YAML, no-op on other hosts | ✓ |
| **CI validation** | Frontmatter regex check + strict `npx skills add` discovery test (verifies every expected skill, Python script, and reference file is present after install; fails the build if any asset is missing) | ✓ |
| **Claude Code plugin path** | `.claude-plugin/marketplace.json` for slash-command routing | ✓ |
Expand Down
12 changes: 8 additions & 4 deletions evals/evals.json
Original file line number Diff line number Diff line change
Expand Up @@ -106,12 +106,16 @@
"test": "report.md has a one-sentence central finding, bulleted refutable contributions, each finding starts with a concrete example, no chronological narration, open questions are specific"
},
{
"skill": "search-arxiv",
"test": "search command returns valid JSON array with fields: arxiv_id, title, authors, year, abstract, pdf_url. citations command returns references and citations arrays. Tested via: python -m pytest tests/test_search_arxiv.py"
"skill": "search-paper (arxiv.py)",
"test": "search command returns valid JSON array with fields: arxiv_id, title, authors, year, abstract, pdf_url, journal_ref. journal-ref command returns the author-supplied venue field. Tested via: python -m pytest tests/test_search_paper.py"
},
{
"skill": "search-iacr",
"test": "search command returns valid JSON array with fields: eprint_id, title, pdf_url, url. url command returns correct ePrint URLs. Tested via: python -m pytest tests/test_search_iacr.py"
"skill": "search-paper (iacr.py)",
"test": "search command returns valid JSON array with fields: eprint_id, title, pdf_url, url. url and pubinfo commands return correct ePrint URLs and 'Publication info' text. Tested via: python -m pytest tests/test_search_paper.py"
},
{
"skill": "search-paper (venue resolver)",
"test": "semantic_scholar.py / dblp.py / openalex.py venue commands return JSON with `found`, `venue`, `venue_full`, `year` fields. Layered protocol in SKILL.md walks them in order and stops at first success. Tested via: python -m pytest tests/test_search_paper.py"
},
{
"skill": "review-literature (H2)",
Expand Down
2 changes: 1 addition & 1 deletion skills/investigate/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ Run all N cycles. The only valid early stop is **genuine convergence**: all hypo

## When Stuck

If a cycle is going nowhere, follow the escalation protocol in `{{REAPER_SKILL_DIR}}/references/methodology.md` (section "When Stuck: 8-Step Escalation"). The steps progress from re-reading existing materials, through searching for new literature (see `{{REAPER_SKILL_DIR}}/references/search-tools.md` for search commands, which use `search_arxiv.py` and `search_iacr.py`), to trying radically different approaches.
If a cycle is going nowhere, follow the escalation protocol in `{{REAPER_SKILL_DIR}}/references/methodology.md` (section "When Stuck: 8-Step Escalation"). The steps progress from re-reading existing materials, through searching for new literature (see `{{REAPER_SKILL_DIR}}/references/search-tools.md` for search commands, which use the `arxiv.py` and `iacr.py` scripts in the `/search-paper` skill), to trying radically different approaches.

When searching for new literature mid-investigation, download relevant papers to `reaper-workspace/papers/`, write per-paper notes (`<id>-notes.md`), and **integrate findings into `reaper-workspace/notes/literature.md` inline** — add new entries to the appropriate existing sections rather than appending a separate "Mid-Investigation Additions" section. Log the search as a cycle with action-type `literature-search` in `notes/results.md`.

Expand Down
Loading
Loading