Papers-HQ — Automated Academic Paper Production Pipeline

A hybrid AI–human research pipeline that takes you from a research idea to a submission-ready paper. Claude handles reasoning (ideation, validation, writing, review); Python handles execution (data loading, code generation, statistics, LaTeX compilation). State is persisted between stages so you can stop, resume, or rerun any stage at will.

The pipeline is built on top of the Claude Code CLI and integrates two community skills from Claes Bäckman: review-paper (Stage 6 peer review) and review-paper-code (Stage 4.7 code review).

Pipeline at a glance

7 core stages plus 2 human checkpoints:

Stage	Name	Type	What it does
1	Discovery	Auto	Finds datasets and 3 seed papers (Path A: by topic, Path B: from your data)
1.5	Data Loading	Auto	Downloads and profiles candidate datasets (Path A only)
2	Ideation	Auto	Generates 8–10 research ideas ranked by novelty × feasibility × impact
2.5	Idea Selection	Human	You pick 1 of the top 3 ideas, or reject all and re-ideate
3	Validation	Auto	8-step evaluation collapsed into 4 calls; literature review via Semantic Scholar
3.3	Quick Empirical Test	Auto	Pre-trends, permutation, magnitude checks — fail-fast before code generation
3.5	Strategy Review	Human	You approve the identification strategy or loop back
3.7	Referee Preview	Auto	Adversarial referee scan for fatal flaws (selection bias, weak instruments)
4	Strategy & Code	Auto	Strategy memo + numbered Python scripts (load → clean → analyze → output)
4.5	Data Audit	Auto	Validates reproducibility of intermediate data outputs
4.7	Code Review	Auto	`review-paper-code` skill + auto-correction loop (max 3 rounds)
5	Writing	Auto	Drafts LaTeX paper from results; compiles to PDF
6	Peer Review	Auto	`review-paper` skill — 6 parallel agents + R&R loop (max 3 rounds)
7	Submission	Auto	Replication audit, integration validation, journal targeting

Installation

Requirements:

Claude Max subscription — the pipeline routes every LLM call through your authenticated Claude session. A pay-per-token Anthropic API key is not sufficient: Stages 2, 3, 4, 5, and 6 launch multi-turn conversations and parallel agents whose volume only fits within the Max plan's quotas.
Google Antigravity — the pipeline is designed to run inside Antigravity's agentic IDE, which hosts the Claude Code session, the long-running terminal context required for Stages 4 and 5 (manual intervention), and the file system access used by the Python orchestrator.
Python 3.11+ (tested on 3.14 on Windows)
LaTeX distribution with pdflatex on PATH (TeX Live or MiKTeX)
Standard Python packages: pandas, numpy, requests (imported ad-hoc — install on first run)

Setup:

git clone https://github.com/jnichor/Top_papers_creator.git
cd Top_papers_creator
pip install pandas numpy requests

Open the cloned folder inside Google Antigravity, sign in with the Google account linked to your Claude Max subscription, and run the pipeline from Antigravity's integrated terminal. All LLM calls flow through the authenticated session — no Anthropic API key is required. The pipeline also uses free public APIs (Dataverse, Zenodo, GitHub, Semantic Scholar) for dataset discovery and literature review.

Usage

Start a new project

# Path A — Discover datasets by topic
python run_pipeline.py --topic "Digital Wallets in Peru"

# Path B — Bring your own dataset
python run_pipeline.py --topic "Labor Markets" --data "./panel.csv"

# Path C — Data-first (find the best public datasets, then suggest topics)
python run_pipeline.py --path-c

Resume or inspect an existing project

# Show progress of a project
python run_pipeline.py --status my_project_20260502_145708

# Resume from a specific stage
python run_pipeline.py --from-stage 4 --project my_project_20260502_145708

# Resume from a human checkpoint
python run_pipeline.py --from-stage 2.5 --project my_project_20260502_145708

# Stop after a specific stage
python run_pipeline.py --topic "Macro" --to-stage 3

Windows note: if python is not on PATH, use the launcher py instead (py run_pipeline.py ...).

Project naming

If --project is not provided, the pipeline derives the name from the topic: {topic_normalized}_{YYYYMMDD_HHMMSS}. Accents are stripped (NFKD), spaces become underscores, and Windows-invalid characters (: ? * / \ " < > |) are removed.

Stage details

Stage 1 — Discovery

Calls Dataverse, Zenodo, GitHub, and Semantic Scholar to find candidate datasets and seed papers.

Path	Trigger	Behavior
A	`--topic` only	Searches GitHub for replication packages (≥50 stars or ≥20 forks), retrieves 3 seed papers
B	`--topic` + `--data`	Profiles your dataset (rows, cols, NA patterns), finds 3 papers matching your data context
C	`--path-c`	Searches for the best public datasets first, then suggests feasible topics

Output: stage1_discovery.md with dataset list and seed papers.

Stage 2 — Ideation

Generates 8–10 ideas scored by 0.4 × novelty + 0.3 × feasibility + 0.3 × impact. For Path B, the prompt is constrained to use real variable names from your dataset.

Output: stage2_ideation.md with the ranked top 3.

Stage 2.5 — Idea Selection (human checkpoint)

Interactive prompt: pick 1 of the top 3 or reject all to re-ideate.

Stage 3 — Validation

Eight-step evaluation collapsed into 4 calls (happy path: A → 5 → 6 → C):

Step	Purpose
A (1+2)	Evaluate idea + critique the evaluation
B (3+4)	Pivot + re-evaluate (only if A scores low)
5	Literature review via Semantic Scholar
6	Verify literature review
C (7+8)	Final verdict + review

Loops: up to MAX_STAGE3_PIVOTS (default 2) before marking STALLED. final_score < 5 → REJECTED_WEAK_ID, returns to Stage 2.5.

Stage 3.3 — Quick Empirical Test

Runs cheap empirical checks on the real data before investing in full code generation:

Package availability (statsmodels, econml, etc.)
Pre-trends — joint F-test on pre-treatment dummies
Permutation test — randomization inference
Country/region trends — survives unit-specific linear trends
Economic magnitude — effect ≥ domain threshold

On failure: retry, proceed with capped score, or supply new data.

Stage 3.5 — Strategy Review (human checkpoint)

Approve or reject the identification strategy. Reject → back to Stage 2.5.

Stage 3.7 — Referee Preview

An adversarial referee agent scans the design for fatal flaws and can veto. Veto → back to Stage 2.5.

Stage 4 — Strategy & Code

Two sub-stages:

4a Strategy Memo — formalizes method, key variables, and causal assumptions. Identification tier is scored:

Tier	Methods	Score range
1 — Causal	DiD, IV, RDD, RCT, event study, synthetic control	75–90
2 — Panel-causal	TWFE + shock, Arellano-Bond, CRE	65–80
3 — Panel-descriptive	FE without causal ID	40–55
4 — Cross-section	OLS, matching, decomposition	20–35

4b/c Code Generation & Execution — generates numbered scripts (1_load.py, 2_clean.py, 3_analyze.py, 4_output.py), auto-creates requirements.txt, runs each script via subprocess, captures errors, and feeds them back to Claude for fixes. A coder-critic must score ≥ 70 (CRITIC_GATE); below threshold triggers up to 2 revision rounds.

Knobs (env vars):

PYTHON_TIMEOUT — per-script timeout in seconds (default 600)
MAX_CODE_RETRIES — error-fix attempts (default 1)

Stage 4.5 — Data Audit

Compares pre-computed result snapshots with re-execution outputs to flag non-determinism.

Stage 4.7 — Code Review

Invokes the review-paper-code skill with two agents focused on reproducibility, code quality, and paper-code alignment. Auto-corrects scripts and re-reviews up to 3 times. Backs up scripts to scripts/python/backup_r1/, backup_r2/, etc.

Stage 5 — Writing

Drafts the LaTeX paper section by section, then compiles to PDF. Auto-fixes to main.tex are restricted to the bibliography window and gated by a structural integrity check (\begin{document}, \end{document}, file ≥ 1000 bytes); failed fixes roll back via .bak snapshots.

Stage 6 — Peer Review

Invokes the review-paper skill with 6 parallel agents:

Agent	Focus
1	Spelling, grammar, academic style
2	Internal consistency and cross-references
3	Unsupported claims, identification integrity
4	Mathematics, equations, notation
5	Tables, figures, documentation
6	Adversarial contribution referee

Decision is computed deterministically from agent scores:

Avg score	Issues	Decision
≥ 75	None fatal	ACCEPT
≥ 60	Any	MINOR_REVISIONS (no re-review)
40–60	Any	MAJOR_REVISIONS → loop to Stage 5
< 40	Fatal	REJECT

R&R loop caps at MAX_RR_ROUNDS rounds (default 3).

Stage 7 — Submission

Five phases:

Replication audit — re-runs all scripts and MD5-hashes outputs against Stage 4 snapshots
Integration validation — paper claims vs. result tables, methods vs. scripts
Quality gate — every component ≥ 70, aggregate ≥ 85 (SUBMISSION_GATE)
Journal targeting — suggested journals based on method and topic (only if gate passes)
Feedback PDF — diagnostics and improvement recommendations

Project layout

After a full run, projects/<project_name>/ contains:

pipeline_state.json           State (current stage, per-stage metadata)
stage1_discovery.md
stage2_ideation.md
selected_idea.md
stage3_validation.md
strategy/
  strategy_memo.md            Method, variables, causal assumptions
  referee_checklist.md
paper/
  main.tex                    LaTeX manuscript
  sections/                   Per-section .tex files
  tables/                     Publication-ready .tex tables
  figures/                    Publication-ready figures
  PROMPT.md                   Research prompt used by the writer agent
data/
  clean/                      CSV outputs from Stage 4 scripts
scripts/
  python/                     Numbered analysis scripts
  python/backup_r1/, ...      Snapshots from Stage 4.7 corrections
quality_reports/              Validator reports
reviews/                      Stage 6 referee reports

State management

State lives in projects/<name>/pipeline_state.json and is written atomically (tempfile + os.replace()), so an interrupted run never leaves a half-written file. If the JSON is ever corrupted, it is renamed to pipeline_state.corrupt-<ts>.json and the pipeline starts from a clean skeleton.

Each stage records its result under stages.stage<N> with at minimum a status field. Selected idea, validation result, identification score, generated scripts, and review decisions are all persisted, which is what makes --from-stage work cleanly.

Limitations

Windows is the primary tested platform. os.replace() atomicity assumes the project directory and the system tempdir live on the same volume — fine on a local disk, not guaranteed on a network share.
Path C is partially implemented. The flag is wired through but Stage 1 does not yet branch on it; treat it as experimental.
Stage 6 R&R has no escalation. After 3 rounds the paper is marked incomplete; there is no automatic fallback to a less ambitious target.
Method classification is keyword-based. The identification tier in Stage 4a relies on string matching ("did", "iv", "rdd"). Misspelled or non-standard method names may be misclassified — keep your strategy memo terminology canonical.
No retry/backoff on Semantic Scholar. Stage 3 uses a 15s timeout but no retry logic; rate-limited responses cause the literature step to fail soft and proceed.

Customization

Stage thresholds — edit pipeline/config.py (CRITIC_GATE, SUBMISSION_GATE, MAX_RR_ROUNDS, MAX_CODE_RETRIES).
Identification scoring — pipeline/stage4_strategy.py::_score_identification().
Reviewer prompts — edit the matching skill file in Skills/.
Per-project context — drop a CLAUDE.md inside projects/<name>/ to give the agents project-specific instructions.

License

MIT — free to use, adapt, and share. Skills under Skills/ retain their original MIT license from Claes Bäckman.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude		.claude
Skills		Skills
clo-anterior		clo-anterior
clo-author/.claude/agents		clo-author/.claude/agents
data		data
idea-evaluation-pipeline		idea-evaluation-pipeline
pipeline		pipeline
projects/determinantes_de_adopcion_de_billetera_digital_en_peru_20260502_145708		projects/determinantes_de_adopcion_de_billetera_digital_en_peru_20260502_145708
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
Explicacion-pipeline.html		Explicacion-pipeline.html
Explicacion-pipeline.pdf		Explicacion-pipeline.pdf
LICENSE		LICENSE
README.md		README.md
create_flowchart.py		create_flowchart.py
mdpdf.log		mdpdf.log
orchestration.md		orchestration.md
pipeline_flowchart.html		pipeline_flowchart.html
pipeline_flowchart.pdf		pipeline_flowchart.pdf
pipeline_run.log		pipeline_run.log
run_pipeline.py		run_pipeline.py
run_stage4_fast.py		run_stage4_fast.py
test_sdk.py		test_sdk.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Papers-HQ — Automated Academic Paper Production Pipeline

Pipeline at a glance

Installation

Usage

Start a new project

Resume or inspect an existing project

Project naming

Stage details

Stage 1 — Discovery

Stage 2 — Ideation

Stage 2.5 — Idea Selection (human checkpoint)

Stage 3 — Validation

Stage 3.3 — Quick Empirical Test

Stage 3.5 — Strategy Review (human checkpoint)

Stage 3.7 — Referee Preview

Stage 4 — Strategy & Code

Stage 4.5 — Data Audit

Stage 4.7 — Code Review

Stage 5 — Writing

Stage 6 — Peer Review

Stage 7 — Submission

Project layout

State management

Limitations

Customization

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Papers-HQ — Automated Academic Paper Production Pipeline

Pipeline at a glance

Installation

Usage

Start a new project

Resume or inspect an existing project

Project naming

Stage details

Stage 1 — Discovery

Stage 2 — Ideation

Stage 2.5 — Idea Selection (human checkpoint)

Stage 3 — Validation

Stage 3.3 — Quick Empirical Test

Stage 3.5 — Strategy Review (human checkpoint)

Stage 3.7 — Referee Preview

Stage 4 — Strategy & Code

Stage 4.5 — Data Audit

Stage 4.7 — Code Review

Stage 5 — Writing

Stage 6 — Peer Review

Stage 7 — Submission

Project layout

State management

Limitations

Customization

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages