Skip to content

thibaultherve/SynapseAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SynapseAI

An AI-powered research paper platform that ingests, analyzes, and lets you chat with scientific literature. Upload a paper β€” SynapseAI extracts its content, generates expert summaries, tags it intelligently, and cross-references it against your entire corpus.

All AI features run through Claude CLI (subprocess), leveraging the fixed-price Pro/Max plan instead of per-token API billing β€” making heavy LLM usage cost-effective at scale.

  • πŸ“„ Multi-Source Ingestion β€” Upload PDFs, paste a URL, or enter a DOI β€” SynapseAI handles acquisition and text extraction automatically
  • 🧠 AI Summaries β€” Claude CLI generates short & detailed summaries, key findings, and metadata extraction for every paper
  • 🏷️ Smart Tagging β€” Papers are categorized into a managed taxonomy (sub-domain, technique, pathology, topic) by the AI
  • πŸ” Semantic Search β€” pgvector-powered similarity search across your entire corpus, with full-text fallback
  • πŸ’¬ RAG Chat β€” Ask questions about a single paper or your whole library β€” answers grounded in your actual research
  • πŸ”— Cross-References β€” Automatically detects citations, contradictions, and extensions between papers

🚧 In active development. Backend is functional, frontend and chat features are being built. Previously named NeuroAI β€” see v1 history.


πŸ“Έ Preview

SynapseAI preview


πŸ› οΈ Tech Stack

Layer Technology
Backend FastAPI Python SQLAlchemy Pydantic Alembic
AI Claude pdfplumber trafilatura
Database PostgreSQL pgvector
Infrastructure Docker Ruff pytest
Frontend (planned) React TypeScript

βš™οΈ Processing Pipeline

Every paper goes through a 6-step pipeline, each tracked independently so failures don't block progress:

πŸ“₯ uploading     Download PDF or fetch web content
       ↓
πŸ“ extracting    Extract text (pdfplumber for PDFs, trafilatura for web)
       ↓
🧠 summarizing   Claude generates summaries, key findings & metadata
       ↓
🏷️  tagging       Claude assigns tags from managed taxonomy
       ↓
πŸ“ embedding     Sentence-transformers chunks β†’ pgvector (384-dim HNSW)
       ↓
πŸ”— crossrefing   Claude detects relations across the corpus

Each step has its own status (pending β†’ processing β†’ done | error | skipped) and can be retried individually. Real-time progress is streamed via Server-Sent Events (SSE).


πŸ—οΈ Architecture

synapseai/
β”œβ”€β”€ api/                  # FastAPI backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ core/         # DB engine, base models, enums, exceptions
β”‚   β”‚   β”œβ”€β”€ papers/       # Paper CRUD, upload, file serving
β”‚   β”‚   β”œβ”€β”€ processing/   # Pipeline orchestration, Claude integration, SSE
β”‚   β”‚   β”œβ”€β”€ tags/         # Tag taxonomy, merge, CRUD
β”‚   β”‚   β”œβ”€β”€ chat/         # RAG chat sessions (planned)
β”‚   β”‚   β”œβ”€β”€ insights/     # Research intelligence (planned)
β”‚   β”‚   └── utils/        # Text extraction, URL validation, DOI resolution
β”‚   β”œβ”€β”€ alembic/          # Database migrations
β”‚   └── tests/            # pytest-asyncio, real PostgreSQL
β”œβ”€β”€ v1/                   # Legacy NeuroAI system (archived)
β”œβ”€β”€ docker-compose.yml
└── .env.example
  • Domain-driven design β€” each feature owns its models, schemas, router, service, and exceptions
  • Async from day one β€” SQLAlchemy 2.0 async + asyncpg, no blocking I/O
  • 10 database tables β€” papers, steps, tags, embeddings, cross-references, chat, insights
  • SSRF protection β€” async DNS resolution + private IP blocking on all URL inputs

πŸš€ Getting Started

Prerequisites

  • Docker & Docker Compose
  • Claude CLI installed and authenticated

Setup

# Clone the repository
git clone https://github.com/thibaultherve/SynapseAI.git
cd SynapseAI

# Configure environment
cp .env.example .env

# Start all services
docker-compose up -d

# Run database migrations
docker-compose exec api alembic upgrade head

# Verify everything works
curl http://localhost:8000/api/health
# β†’ {"status":"ok","database":"connected"}

Running Tests

docker-compose exec api pytest -v

Services

Service Port Description
api 8000 FastAPI backend (auto-reload in dev)
db 5432 PostgreSQL 16 + pgvector
db-test 5434 Isolated test database

πŸ“‘ API Endpoints

Papers

POST   /api/papers/upload          Upload PDF (multipart)
POST   /api/papers                 Create from URL or DOI
GET    /api/papers                 List (paginated, filterable)
GET    /api/papers/:id             Full paper detail
GET    /api/papers/:id/file        Download original PDF
PATCH  /api/papers/:id             Update metadata
DELETE /api/papers/:id             Delete (cascade)

Processing

GET    /api/papers/:id/steps       List processing steps
POST   /api/papers/:id/retry/:step Retry a failed step
GET    /api/papers/:id/status      SSE stream (real-time progress)

Tags

GET    /api/tags                   All tags grouped by category
GET    /api/tags/:id/papers        Papers with a specific tag
PATCH  /api/tags/:id               Rename tag
DELETE /api/tags/:id               Delete tag
POST   /api/tags/merge             Merge source β†’ target

πŸ”œ Upcoming Features

Feature Description
πŸ” Semantic Search Full-text + vector similarity search across the corpus
πŸ’¬ RAG Chat Chat with a single paper or the entire library, answers grounded in your research
🌐 React Frontend SPA with PDF viewer, chat panel, and tag management
πŸ“Š Insight Engine AI-generated research gaps, hypotheses, and trend detection
πŸ—ΊοΈ Knowledge Graph Visual exploration of paper relationships and cross-references

πŸ“œ v1 β€” NeuroAI

SynapseAI is the successor to NeuroAI, a Notion-based research assistant built with Claude Code skills and Python scripts. v1 is archived in the /v1 directory with its own README.

v1 β€” NeuroAI v2 β€” SynapseAI
UI Notion database React SPA (planned)
Data JSON flat files + Notion API PostgreSQL + pgvector
Processing Claude Code skills β†’ Python scripts FastAPI async pipeline
Search Tag-based only Semantic + full-text
Chat Comment-based Q&A in Notion RAG with conversation history
Deployment Manual skill triggers Docker Compose, containerized

πŸ“ License

This project is source-available under a non-commercial license. You are free to view, fork, modify, and redistribute the code β€” as long as it remains non-commercial with attribution.

See LICENSE for details.


πŸ€– AI Usage

In the interest of transparency: AI is used regularly throughout this project as a development tool β€” for code generation, refactoring, debugging, and documentation. But it remains exactly that: a tool. As the sole developer, I define the architecture, enforce best practices, and maintain full control over technical direction. AI accelerates execution β€” it doesn't replace thinking.


πŸ‘€ Author

Thibault HervΓ©
Full-Stack Developer
Python FastAPI React TypeScript

LinkedIn GitHub

About

🧠 AI-powered research paper platform, import, analyze, and chat with scientific literature using Claude CLI. 🚧 [In Development]

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages