Skip to content

opieeipo/EasyLit

Repository files navigation

EasyLit -- Research Extraction Studio

Project Status

Mode: Maintenance -- Core features complete. Accepting bug reports and minor enhancements.

Area Status Progress
Core extraction engine Done 100%
Pre-flight token estimation Done 100%
CSV / HTML Report / BibTeX export Done 100%
10-style citation formatting Done 100%
Google OAuth + analysis history Done 100%
Admin dashboard + prompt versioning Done 100%
Donorware conversion (shared API key) Done 100%
Droplet deployment (nginx/gunicorn/systemd) Done 100%
Daily usage cap Done 100%
Custom domain + SSL Done 100%
Health meter + Buy Me a Coffee tips Done 100%
Mendeley integration Future 0%
Overall 100%

Python Flask Claude Zotero Supabase License

EasyLit is a web application that automates structured data extraction from academic PDFs stored in your Zotero library. It uses the Claude API to analyze research articles and extract constructs, variables, hypotheses, scales, themes, and research gaps -- with live analytics, trend synthesis, and a full HTML report with bibliography.

Claude API costs are covered (donorware model) -- users only need a free Zotero account.


Where EasyLit Fits in the Landscape

There are already several well-developed literature review tools out there, and EasyLit is not trying to replace any of them. It exists to fill one specific gap that the others do not address well.

The existing players

Tool Strength What it is
Elicit Semantic search across 125M+ papers, AI-generated summary tables Discovery and abstract-level extraction
Research Rabbit Citation graph exploration, "Spotify for papers" Discovery
Litmaps Visual citation maps, alerts on new related work Discovery and monitoring
Connected Papers Visual graph of related work from a seed paper Discovery
Scite "Smart citations" showing supporting vs. contradicting claims Citation analysis
Covidence / Rayyan PRISMA-style screening workflows for systematic reviews Team screening
SciSpace Chat-with-PDF, extraction tables, paraphrasing Full-text analysis

These tools are all strong in their lanes, and if you need article discovery, citation graph exploration, or team-based PRISMA screening, you should use them. EasyLit does not try to compete with any of them.

The corpus-access problem nobody talks about

All of the discovery-focused tools above run into the same wall: getting actual full text is expensive and legally complicated.

Most of them lean on Semantic Scholar (~200M papers), OpenAlex (~250M works), CrossRef, PubMed, arXiv, or DOAJ for their backing corpus. These are free or near-free sources, but they only reliably provide metadata and abstracts, plus full text for open-access papers. Anything behind a publisher paywall is out of reach without direct licensing deals. Elicit has been working on publisher agreements, but those are slow, expensive, and incomplete. Covidence and Rayyan sidestep the problem entirely by making you upload the PDFs yourself, and SciSpace increasingly does the same.

The overhead this creates is substantial:

  • Embedding and indexing compute for hundreds of millions of papers
  • Ongoing metadata freshness (retractions, DOI changes, new preprint versions)
  • Legal and contractual work to license full text where possible
  • Storage that scales linearly with every user who uploads their own library

All of that is why every full-featured tool in this space charges somewhere between $12 and $20/mo to individuals, or thousands per year to institutions.

What EasyLit does differently

EasyLit assumes you have already solved the hardest problem: you brought the corpus.

Your institutional library paid for the access. Zotero stored the PDFs. You did the relevance filtering and screening. At that point, the remaining job -- actually pulling structured research data out of those PDFs -- is a narrow, well-defined extraction task, and that is the only thing EasyLit does.

Because the tool never maintains a corpus, there is no indexing bill, no metadata graph to keep fresh, no licensing to negotiate, and no storage that scales with every new user. The entire cost surface is just the Claude API calls for the specific PDFs you point at it. That is precisely why the donorware model works here and would not work for Elicit.

Why you might prefer EasyLit

  1. You already curated the corpus. EasyLit is a post-curation extraction tool, not a discovery tool. If you already have a Zotero collection of the papers you care about, this is the straight line from "200 PDFs in a folder" to "structured CSV of constructs, variables, hypotheses, scales, and gaps."
  2. Structured extraction tuned for empirical research. The schema is opinionated: constructs, IV/DV/moderator variables, hypothesis-level row expansion with beta coefficients and effect sizes, measurement scales, sample characteristics, themes, and research gaps. This is specifically aimed at quantitative and mixed-methods dissertation and meta-analysis work.
  3. Donorware -- free to end users. Shared Claude API key with a daily per-user cap. No subscription, no paywall, no credit card.
  4. Zotero-native. Collection tree picker, PDF resolution from local storage or the Zotero API, BibTeX round-trip. No re-uploading a library you already maintain.
  5. Transparent and self-hostable. Public repo. You can run it on your own Droplet with your own Anthropic key. Your PDFs and prompts do not flow through a closed SaaS stack.
  6. Reports out of the box. HTML report with SVG publication timeline, variable co-occurrence matrix, 10 citation styles, and BibTeX export. Most competitors stop at "here is a table, export to CSV."

Where EasyLit is deliberately weaker

  • No discovery or search. If you do not already have a Zotero library, EasyLit cannot help you build one. Use Elicit, Research Rabbit, or Litmaps for that stage.
  • No PRISMA screening workflow. Covidence and Rayyan own that space and do it well.
  • No citation graph visualization. Research Rabbit, Litmaps, and Connected Papers own that.
  • Single-worker Flask app. Not built for large team collaboration the way Covidence is.

The honest pitch: if you are a researcher who already uses Zotero and needs to pull structured empirical data out of a curated collection without paying a monthly subscription or uploading your library to someone else's server, this is for you. It is a sharp tool for a narrow slot, not an Elicit killer.


Features

Extraction

  • Zotero Collection Tree Picker -- Browse and select any collection from your Zotero library directly in the UI
  • Automated PDF Extraction -- Downloads PDFs via the Zotero API and sends them to Claude for analysis
  • Claude-Powered Analysis -- Extracts constructs, independent/dependent variables, moderators, instrumentation, key definitions, themes, theoretical constructs, and research gaps
  • Meta-Analysis Mode -- Captures hypothesis-level numerical data (beta coefficients, effect sizes, sample sizes, R values) for quantitative synthesis
  • Scale & Sample Mode -- Extracts measurement scale details and sample characteristics
  • Running Trend Synthesis -- Periodic Claude-generated summaries of emerging patterns as articles are processed
  • Mock Mode -- Tests the full UI and visualizations with synthetic data at no API cost

Pre-flight & Cost Control

  • Token Pre-Estimation -- Before every real run, EasyLit scans all PDFs using count_tokens() (no inference cost) and shows a pre-flight modal with exact input token counts, estimated output tokens, projected API cost, and estimated run time based on historical jobs
  • Daily Usage Cap -- Configurable per-user daily cost limit (default $2/day) to prevent runaway spend
  • The pre-flight modal requires confirmation before launching -- Cancel discards the run entirely

Output

  • CSV Export -- One row per article (or per hypothesis in Meta mode) with all extracted fields
  • HTML Report -- Rendered research report with synthesis, publication timeline, frequency tables, variable co-occurrence matrix, bibliography, and a citable EasyLit reference
  • BibTeX Export -- .bib file generated from article metadata for direct import into reference managers
  • 10-Style Citation Card -- Live citation card on the analysis panel supporting APA 7, MLA 9, Chicago (Full Note and Author-Date), IEEE, Vancouver, Nature, AMA, Elsevier Harvard, and Bluebook

Job Resilience

  • Navigation-safe jobs -- Closing or refreshing the tab does not kill the job; the server thread continues running
  • Auto-reconnect -- On return, EasyLit detects any active or recently completed job and restores the log, download buttons, and citation card
  • Browser notifications -- When the job finishes while the tab is in the background, a browser notification fires (requires one-time permission grant)

Admin Dashboard (/admin)

  • Usage dashboard -- Total jobs, articles, pages, API cost, token counts; per-user breakdown and guest activity
  • Prompt versioning -- View, edit, and version all six Claude prompt templates; roll back to any prior version; changes take effect on the next extraction run

Auth & History

  • Google OAuth -- Sign in with Google to save analyses and access history
  • Analysis History -- Re-download CSV, report, or BibTeX for any past extraction without re-running
  • Encrypted Zotero key storage -- Zotero API keys are encrypted at rest in Supabase using Fernet symmetric encryption

Requirements

  • Python 3.9 or higher
  • A Zotero account with API access
  • A Supabase project (free tier is sufficient)
  • A Google OAuth application (for sign-in)
  • An Anthropic API key (server-side only, not required from users)

Installation

1. Clone the repository

git clone https://github.com/opieeipo/EasyLit.git
cd EasyLit

2. Create a virtual environment

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

# Windows
python -m venv venv
venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

Copy env.example to .env and fill in your values. Key additions for donorware mode:

# Shared Claude API key (all users share this)
ANTHROPIC_API_KEY=sk-ant-your-key

# Daily cost cap per user in USD
DAILY_COST_CAP_USD=2.00

# App base URL (your domain in production)
APP_BASE_URL=http://localhost:8000

See env.example for the full list.

5. Run the Supabase schema

Run schema.sql and schema_prompt_versions.sql in your Supabase SQL Editor to create all required tables.


Running Locally

python easylit_app.py

Then open your browser to: http://localhost:8000


Configuration

On first launch, the setup wizard guides you through connecting Zotero. Claude API is provided server-side -- no user API key needed.

Setting Description
Zotero API Key Found at zotero.org/settings/keys
Zotero Library ID Your numeric user ID -- use Auto-detect after entering your key
Model claude-sonnet-4-20250514 recommended; Opus and Haiku also available
Delay (seconds) Pause between PDFs to manage API rate limits
Retries Retry attempts on failed extractions
Citation Format Default style for the citation card and report (10 styles supported)

Zotero settings for registered users are stored encrypted in Supabase.


Extraction Options

Option Description
Meta-Analysis Mode Hypothesis rows with IV/DV names, beta coefficients, effect sizes, sample sizes
Scale & Sample Mode Measurement scale details and sample characteristics
Remove Empty Columns Drops columns with no extracted data from the CSV
Running Trend Synthesis Claude summarizes emerging themes every N articles
Mock Mode Synthetic data, no API calls, no cost -- for UI testing only

Project Structure

EasyLit/
|-- easylit_app.py          # Flask app entry point and route definitions
|-- jobs.py                 # Background extraction thread and job state
|-- estimate.py             # Token pre-estimation thread (count_tokens)
|-- extraction.py           # PDF/data processing helpers
|-- prompts.py              # Claude prompt templates and versioning
|-- reports.py              # HTML report and BibTeX generation
|-- zotero_client.py        # Zotero API helpers
|-- db.py                   # Supabase database layer + usage cap
|-- auth.py                 # Google OAuth blueprint
|-- admin.py                # Admin dashboard blueprint
|-- digest.py               # Usage digest mailer
|-- config.py               # Default config and load/save stubs
|-- deploy/
|   |-- setup.sh            # Droplet provisioning script
|   |-- easylit.nginx       # Nginx reverse proxy config
|   |-- easylit.service     # systemd service unit
|-- schema.sql              # Main Supabase schema
|-- schema_prompt_versions.sql  # Prompt versioning table
|-- requirements.txt        # Python dependencies
|-- templates/
|   |-- index.html          # Main app frontend
|   |-- report.html         # Report template
|   |-- admin/
|       |-- dashboard.html  # Admin dashboard
|       |-- login.html      # Admin login
|-- README.md               # This file

Tech Stack

  • Backend: Python, Flask, gunicorn, PyZotero, Anthropic Python SDK
  • Frontend: Vanilla JavaScript, HTML/CSS
  • AI: Anthropic Claude (claude-sonnet-4-20250514)
  • Database / Auth: Supabase (PostgreSQL + service role key)
  • Google OAuth: Authlib
  • Encryption: Python cryptography library (Fernet)
  • Deployment: DigitalOcean Droplet (nginx + gunicorn + systemd + Let's Encrypt)

Deployment (DigitalOcean Droplet)

  1. Create a $6/mo Droplet (1 vCPU, 1GB RAM, Ubuntu 22.04)
  2. Point your domain's A record to the Droplet IP
  3. SSH in and run: bash deploy/setup.sh your-domain.com
  4. Copy your .env to /opt/easylit/.env
  5. systemctl start easylit

To deploy updates:

cd /opt/easylit && git pull && source venv/bin/activate && pip install -r requirements.txt && systemctl restart easylit

Notes

  • Jobs run as background threads on the server. Closing the browser tab does not interrupt an in-progress extraction.
  • If a PDF cannot be retrieved, the article is logged as skipped and processing continues.
  • The token pre-estimation pre-flight uses count_tokens() which is billed at a fraction of full inference cost but is not free.
  • Prompt versions are loaded from Supabase at job start. Changes in the admin Prompts tab take effect on the next run with no redeployment needed.
  • Daily usage cap is checked at job start. Users who exceed the cap see a clear message with their spend and the limit.

License

EasyLit is released under the Apache License 2.0. You are free to use, modify, and redistribute the code for any purpose, commercial or otherwise, subject to the terms of the license. Attribution is required; warranty is not provided.


Author

M. Opie Frazier

About

LitLens is a local web application that automates structured data extraction from academic PDFs stored in your Zotero library. It uses the Claude API to analyze research articles and extract constructs, variables, hypotheses, scales, and more, with live analytics and trend synthesis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages