EasyLit -- Research Extraction Studio

Project Status

Mode: Maintenance -- Core features complete. Accepting bug reports and minor enhancements.

Area	Status	Progress
Core extraction engine	Done
Pre-flight token estimation	Done
CSV / HTML Report / BibTeX export	Done
10-style citation formatting	Done
Google OAuth + analysis history	Done
Admin dashboard + prompt versioning	Done
Donorware conversion (shared API key)	Done
Droplet deployment (nginx/gunicorn/systemd)	Done
Daily usage cap	Done
Custom domain + SSL	Done
Health meter + Buy Me a Coffee tips	Done
Mendeley integration	Future
Overall

EasyLit is a web application that automates structured data extraction from academic PDFs stored in your Zotero library. It uses the Claude API to analyze research articles and extract constructs, variables, hypotheses, scales, themes, and research gaps -- with live analytics, trend synthesis, and a full HTML report with bibliography.

Claude API costs are covered (donorware model) -- users only need a free Zotero account.

Where EasyLit Fits in the Landscape

There are already several well-developed literature review tools out there, and EasyLit is not trying to replace any of them. It exists to fill one specific gap that the others do not address well.

The existing players

Tool	Strength	What it is
Elicit	Semantic search across 125M+ papers, AI-generated summary tables	Discovery and abstract-level extraction
Research Rabbit	Citation graph exploration, "Spotify for papers"	Discovery
Litmaps	Visual citation maps, alerts on new related work	Discovery and monitoring
Connected Papers	Visual graph of related work from a seed paper	Discovery
Scite	"Smart citations" showing supporting vs. contradicting claims	Citation analysis
Covidence / Rayyan	PRISMA-style screening workflows for systematic reviews	Team screening
SciSpace	Chat-with-PDF, extraction tables, paraphrasing	Full-text analysis

These tools are all strong in their lanes, and if you need article discovery, citation graph exploration, or team-based PRISMA screening, you should use them. EasyLit does not try to compete with any of them.

The corpus-access problem nobody talks about

All of the discovery-focused tools above run into the same wall: getting actual full text is expensive and legally complicated.

Most of them lean on Semantic Scholar (~200M papers), OpenAlex (~250M works), CrossRef, PubMed, arXiv, or DOAJ for their backing corpus. These are free or near-free sources, but they only reliably provide metadata and abstracts, plus full text for open-access papers. Anything behind a publisher paywall is out of reach without direct licensing deals. Elicit has been working on publisher agreements, but those are slow, expensive, and incomplete. Covidence and Rayyan sidestep the problem entirely by making you upload the PDFs yourself, and SciSpace increasingly does the same.

The overhead this creates is substantial:

Embedding and indexing compute for hundreds of millions of papers
Ongoing metadata freshness (retractions, DOI changes, new preprint versions)
Legal and contractual work to license full text where possible
Storage that scales linearly with every user who uploads their own library

All of that is why every full-featured tool in this space charges somewhere between $12 and $20/mo to individuals, or thousands per year to institutions.

What EasyLit does differently

EasyLit assumes you have already solved the hardest problem: you brought the corpus.

Your institutional library paid for the access. Zotero stored the PDFs. You did the relevance filtering and screening. At that point, the remaining job -- actually pulling structured research data out of those PDFs -- is a narrow, well-defined extraction task, and that is the only thing EasyLit does.

Because the tool never maintains a corpus, there is no indexing bill, no metadata graph to keep fresh, no licensing to negotiate, and no storage that scales with every new user. The entire cost surface is just the Claude API calls for the specific PDFs you point at it. That is precisely why the donorware model works here and would not work for Elicit.

Why you might prefer EasyLit

You already curated the corpus. EasyLit is a post-curation extraction tool, not a discovery tool. If you already have a Zotero collection of the papers you care about, this is the straight line from "200 PDFs in a folder" to "structured CSV of constructs, variables, hypotheses, scales, and gaps."
Structured extraction tuned for empirical research. The schema is opinionated: constructs, IV/DV/moderator variables, hypothesis-level row expansion with beta coefficients and effect sizes, measurement scales, sample characteristics, themes, and research gaps. This is specifically aimed at quantitative and mixed-methods dissertation and meta-analysis work.
Donorware -- free to end users. Shared Claude API key with a daily per-user cap. No subscription, no paywall, no credit card.
Zotero-native. Collection tree picker, PDF resolution from local storage or the Zotero API, BibTeX round-trip. No re-uploading a library you already maintain.
Transparent and self-hostable. Public repo. You can run it on your own Droplet with your own Anthropic key. Your PDFs and prompts do not flow through a closed SaaS stack.
Reports out of the box. HTML report with SVG publication timeline, variable co-occurrence matrix, 10 citation styles, and BibTeX export. Most competitors stop at "here is a table, export to CSV."

Where EasyLit is deliberately weaker

No discovery or search. If you do not already have a Zotero library, EasyLit cannot help you build one. Use Elicit, Research Rabbit, or Litmaps for that stage.
No PRISMA screening workflow. Covidence and Rayyan own that space and do it well.
No citation graph visualization. Research Rabbit, Litmaps, and Connected Papers own that.
Single-worker Flask app. Not built for large team collaboration the way Covidence is.

The honest pitch: if you are a researcher who already uses Zotero and needs to pull structured empirical data out of a curated collection without paying a monthly subscription or uploading your library to someone else's server, this is for you. It is a sharp tool for a narrow slot, not an Elicit killer.

Features

Extraction

Zotero Collection Tree Picker -- Browse and select any collection from your Zotero library directly in the UI
Automated PDF Extraction -- Downloads PDFs via the Zotero API and sends them to Claude for analysis
Claude-Powered Analysis -- Extracts constructs, independent/dependent variables, moderators, instrumentation, key definitions, themes, theoretical constructs, and research gaps
Meta-Analysis Mode -- Captures hypothesis-level numerical data (beta coefficients, effect sizes, sample sizes, R values) for quantitative synthesis
Scale & Sample Mode -- Extracts measurement scale details and sample characteristics
Running Trend Synthesis -- Periodic Claude-generated summaries of emerging patterns as articles are processed
Mock Mode -- Tests the full UI and visualizations with synthetic data at no API cost

Pre-flight & Cost Control

Token Pre-Estimation -- Before every real run, EasyLit scans all PDFs using count_tokens() (no inference cost) and shows a pre-flight modal with exact input token counts, estimated output tokens, projected API cost, and estimated run time based on historical jobs
Daily Usage Cap -- Configurable per-user daily cost limit (default $2/day) to prevent runaway spend
The pre-flight modal requires confirmation before launching -- Cancel discards the run entirely

Output

CSV Export -- One row per article (or per hypothesis in Meta mode) with all extracted fields
HTML Report -- Rendered research report with synthesis, publication timeline, frequency tables, variable co-occurrence matrix, bibliography, and a citable EasyLit reference
BibTeX Export -- .bib file generated from article metadata for direct import into reference managers
10-Style Citation Card -- Live citation card on the analysis panel supporting APA 7, MLA 9, Chicago (Full Note and Author-Date), IEEE, Vancouver, Nature, AMA, Elsevier Harvard, and Bluebook

Job Resilience

Navigation-safe jobs -- Closing or refreshing the tab does not kill the job; the server thread continues running
Auto-reconnect -- On return, EasyLit detects any active or recently completed job and restores the log, download buttons, and citation card
Browser notifications -- When the job finishes while the tab is in the background, a browser notification fires (requires one-time permission grant)

Admin Dashboard (`/admin`)

Usage dashboard -- Total jobs, articles, pages, API cost, token counts; per-user breakdown and guest activity
Prompt versioning -- View, edit, and version all six Claude prompt templates; roll back to any prior version; changes take effect on the next extraction run

Auth & History

Google OAuth -- Sign in with Google to save analyses and access history
Analysis History -- Re-download CSV, report, or BibTeX for any past extraction without re-running
Encrypted Zotero key storage -- Zotero API keys are encrypted at rest in Supabase using Fernet symmetric encryption

Requirements

Python 3.9 or higher
A Zotero account with API access
A Supabase project (free tier is sufficient)
A Google OAuth application (for sign-in)
An Anthropic API key (server-side only, not required from users)

Installation

1. Clone the repository

git clone https://github.com/opieeipo/EasyLit.git
cd EasyLit

2. Create a virtual environment

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

# Windows
python -m venv venv
venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

Copy env.example to .env and fill in your values. Key additions for donorware mode:

# Shared Claude API key (all users share this)
ANTHROPIC_API_KEY=sk-ant-your-key

# Daily cost cap per user in USD
DAILY_COST_CAP_USD=2.00

# App base URL (your domain in production)
APP_BASE_URL=http://localhost:8000

See env.example for the full list.

5. Run the Supabase schema

Run schema.sql and schema_prompt_versions.sql in your Supabase SQL Editor to create all required tables.

Running Locally

python easylit_app.py

Then open your browser to: http://localhost:8000

Configuration

On first launch, the setup wizard guides you through connecting Zotero. Claude API is provided server-side -- no user API key needed.

Setting	Description
Zotero API Key	Found at zotero.org/settings/keys
Zotero Library ID	Your numeric user ID -- use Auto-detect after entering your key
Model	`claude-sonnet-4-20250514` recommended; Opus and Haiku also available
Delay (seconds)	Pause between PDFs to manage API rate limits
Retries	Retry attempts on failed extractions
Citation Format	Default style for the citation card and report (10 styles supported)

Zotero settings for registered users are stored encrypted in Supabase.

Extraction Options

Option	Description
Meta-Analysis Mode	Hypothesis rows with IV/DV names, beta coefficients, effect sizes, sample sizes
Scale & Sample Mode	Measurement scale details and sample characteristics
Remove Empty Columns	Drops columns with no extracted data from the CSV
Running Trend Synthesis	Claude summarizes emerging themes every N articles
Mock Mode	Synthetic data, no API calls, no cost -- for UI testing only

Project Structure

EasyLit/
|-- easylit_app.py          # Flask app entry point and route definitions
|-- jobs.py                 # Background extraction thread and job state
|-- estimate.py             # Token pre-estimation thread (count_tokens)
|-- extraction.py           # PDF/data processing helpers
|-- prompts.py              # Claude prompt templates and versioning
|-- reports.py              # HTML report and BibTeX generation
|-- zotero_client.py        # Zotero API helpers
|-- db.py                   # Supabase database layer + usage cap
|-- auth.py                 # Google OAuth blueprint
|-- admin.py                # Admin dashboard blueprint
|-- digest.py               # Usage digest mailer
|-- config.py               # Default config and load/save stubs
|-- deploy/
|   |-- setup.sh            # Droplet provisioning script
|   |-- easylit.nginx       # Nginx reverse proxy config
|   |-- easylit.service     # systemd service unit
|-- schema.sql              # Main Supabase schema
|-- schema_prompt_versions.sql  # Prompt versioning table
|-- requirements.txt        # Python dependencies
|-- templates/
|   |-- index.html          # Main app frontend
|   |-- report.html         # Report template
|   |-- admin/
|       |-- dashboard.html  # Admin dashboard
|       |-- login.html      # Admin login
|-- README.md               # This file

Tech Stack

Backend: Python, Flask, gunicorn, PyZotero, Anthropic Python SDK
Frontend: Vanilla JavaScript, HTML/CSS
AI: Anthropic Claude (claude-sonnet-4-20250514)
Database / Auth: Supabase (PostgreSQL + service role key)
Google OAuth: Authlib
Encryption: Python cryptography library (Fernet)
Deployment: DigitalOcean Droplet (nginx + gunicorn + systemd + Let's Encrypt)

Deployment (DigitalOcean Droplet)

Create a $6/mo Droplet (1 vCPU, 1GB RAM, Ubuntu 22.04)
Point your domain's A record to the Droplet IP
SSH in and run: bash deploy/setup.sh your-domain.com
Copy your .env to /opt/easylit/.env
systemctl start easylit

To deploy updates:

cd /opt/easylit && git pull && source venv/bin/activate && pip install -r requirements.txt && systemctl restart easylit

Notes

Jobs run as background threads on the server. Closing the browser tab does not interrupt an in-progress extraction.
If a PDF cannot be retrieved, the article is logged as skipped and processing continues.
The token pre-estimation pre-flight uses count_tokens() which is billed at a fraction of full inference cost but is not free.
Prompt versions are loaded from Supabase at job start. Changes in the admin Prompts tab take effect on the next run with no redeployment needed.
Daily usage cap is checked at job start. Users who exceed the cap see a clear message with their spend and the limit.

License

EasyLit is released under the Apache License 2.0. You are free to use, modify, and redistribute the code for any purpose, commercial or otherwise, subject to the terms of the license. Attribution is required; warranty is not provided.

Author

M. Opie Frazier

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
deploy		deploy
static		static
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
admin.py		admin.py
auth.py		auth.py
config.py		config.py
db.py		db.py
digest.py		digest.py
easylit_app.py		easylit_app.py
env.example		env.example
estimate.py		estimate.py
extraction.py		extraction.py
jobs.py		jobs.py
prompts.py		prompts.py
reports.py		reports.py
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
schema.sql		schema.sql
zotero_client.py		zotero_client.py

Folders and files

Latest commit

History

Repository files navigation

EasyLit -- Research Extraction Studio

Project Status

Where EasyLit Fits in the Landscape

The existing players

The corpus-access problem nobody talks about

What EasyLit does differently

Why you might prefer EasyLit

Where EasyLit is deliberately weaker

Features

Extraction

Pre-flight & Cost Control

Output

Job Resilience

Admin Dashboard (/admin)

Auth & History

Requirements

Installation

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Configure environment variables

5. Run the Supabase schema

Running Locally

Configuration

Extraction Options

Project Structure

Tech Stack

Deployment (DigitalOcean Droplet)

Notes

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Admin Dashboard (`/admin`)

Packages