An AI-powered research paper platform that ingests, analyzes, and lets you chat with scientific literature. Upload a paper β SynapseAI extracts its content, generates expert summaries, tags it intelligently, and cross-references it against your entire corpus.
All AI features run through Claude CLI (subprocess), leveraging the fixed-price Pro/Max plan instead of per-token API billing β making heavy LLM usage cost-effective at scale.
- π Multi-Source Ingestion β Upload PDFs, paste a URL, or enter a DOI β SynapseAI handles acquisition and text extraction automatically
- π§ AI Summaries β Claude CLI generates short & detailed summaries, key findings, and metadata extraction for every paper
- π·οΈ Smart Tagging β Papers are categorized into a managed taxonomy (sub-domain, technique, pathology, topic) by the AI
- π Semantic Search β pgvector-powered similarity search across your entire corpus, with full-text fallback
- π¬ RAG Chat β Ask questions about a single paper or your whole library β answers grounded in your actual research
- π Cross-References β Automatically detects citations, contradictions, and extensions between papers
π§ In active development. Backend is functional, frontend and chat features are being built. Previously named NeuroAI β see v1 history.
| Layer | Technology |
|---|---|
| Backend | |
| AI | |
| Database | |
| Infrastructure | |
| Frontend (planned) |
Every paper goes through a 6-step pipeline, each tracked independently so failures don't block progress:
π₯ uploading Download PDF or fetch web content
β
π extracting Extract text (pdfplumber for PDFs, trafilatura for web)
β
π§ summarizing Claude generates summaries, key findings & metadata
β
π·οΈ tagging Claude assigns tags from managed taxonomy
β
π embedding Sentence-transformers chunks β pgvector (384-dim HNSW)
β
π crossrefing Claude detects relations across the corpus
Each step has its own status (pending β processing β done | error | skipped) and can be retried individually. Real-time progress is streamed via Server-Sent Events (SSE).
synapseai/
βββ api/ # FastAPI backend
β βββ app/
β β βββ core/ # DB engine, base models, enums, exceptions
β β βββ papers/ # Paper CRUD, upload, file serving
β β βββ processing/ # Pipeline orchestration, Claude integration, SSE
β β βββ tags/ # Tag taxonomy, merge, CRUD
β β βββ chat/ # RAG chat sessions (planned)
β β βββ insights/ # Research intelligence (planned)
β β βββ utils/ # Text extraction, URL validation, DOI resolution
β βββ alembic/ # Database migrations
β βββ tests/ # pytest-asyncio, real PostgreSQL
βββ v1/ # Legacy NeuroAI system (archived)
βββ docker-compose.yml
βββ .env.example
- Domain-driven design β each feature owns its models, schemas, router, service, and exceptions
- Async from day one β SQLAlchemy 2.0 async + asyncpg, no blocking I/O
- 10 database tables β papers, steps, tags, embeddings, cross-references, chat, insights
- SSRF protection β async DNS resolution + private IP blocking on all URL inputs
- Docker & Docker Compose
- Claude CLI installed and authenticated
# Clone the repository
git clone https://github.com/thibaultherve/SynapseAI.git
cd SynapseAI
# Configure environment
cp .env.example .env
# Start all services
docker-compose up -d
# Run database migrations
docker-compose exec api alembic upgrade head
# Verify everything works
curl http://localhost:8000/api/health
# β {"status":"ok","database":"connected"}docker-compose exec api pytest -v| Service | Port | Description |
|---|---|---|
api |
8000 | FastAPI backend (auto-reload in dev) |
db |
5432 | PostgreSQL 16 + pgvector |
db-test |
5434 | Isolated test database |
POST /api/papers/upload Upload PDF (multipart)
POST /api/papers Create from URL or DOI
GET /api/papers List (paginated, filterable)
GET /api/papers/:id Full paper detail
GET /api/papers/:id/file Download original PDF
PATCH /api/papers/:id Update metadata
DELETE /api/papers/:id Delete (cascade)
GET /api/papers/:id/steps List processing steps
POST /api/papers/:id/retry/:step Retry a failed step
GET /api/papers/:id/status SSE stream (real-time progress)
GET /api/tags All tags grouped by category
GET /api/tags/:id/papers Papers with a specific tag
PATCH /api/tags/:id Rename tag
DELETE /api/tags/:id Delete tag
POST /api/tags/merge Merge source β target
| Feature | Description |
|---|---|
| π Semantic Search | Full-text + vector similarity search across the corpus |
| π¬ RAG Chat | Chat with a single paper or the entire library, answers grounded in your research |
| π React Frontend | SPA with PDF viewer, chat panel, and tag management |
| π Insight Engine | AI-generated research gaps, hypotheses, and trend detection |
| πΊοΈ Knowledge Graph | Visual exploration of paper relationships and cross-references |
SynapseAI is the successor to NeuroAI, a Notion-based research assistant built with Claude Code skills and Python scripts. v1 is archived in the /v1 directory with its own README.
| v1 β NeuroAI | v2 β SynapseAI | |
|---|---|---|
| UI | Notion database | React SPA (planned) |
| Data | JSON flat files + Notion API | PostgreSQL + pgvector |
| Processing | Claude Code skills β Python scripts | FastAPI async pipeline |
| Search | Tag-based only | Semantic + full-text |
| Chat | Comment-based Q&A in Notion | RAG with conversation history |
| Deployment | Manual skill triggers | Docker Compose, containerized |
This project is source-available under a non-commercial license. You are free to view, fork, modify, and redistribute the code β as long as it remains non-commercial with attribution.
See LICENSE for details.
In the interest of transparency: AI is used regularly throughout this project as a development tool β for code generation, refactoring, debugging, and documentation. But it remains exactly that: a tool. As the sole developer, I define the architecture, enforce best practices, and maintain full control over technical direction. AI accelerates execution β it doesn't replace thinking.
|
Thibault HervΓ© Full-Stack Developer |
