๐ง Under construction
Graph-augmented semantic search for academic literature
Current frontend with mock data: semantic results + graph view
- ๐ Fetch papers from ArXiv API
- ๐ง Queue papers for embedding (deferred, async)
- ๐งฎ Track paper ingestion state in SQLite index
- ๐ฆ Index embeddings into Qdrant
- ๐ Search Qdrant with fastembed support
- ๐งฉ Merge vector + (planned) graph hits
- โ Type-safe, testable, modular pipeline
- โ๏ธ Health checks, typed interfaces, and task runner setup
- Python 3.12+
- Node.js 20+
- Docker + Docker Compose
make up-full # Run full stack with CPU
make up-full USE_GPU=1 # Run with GPU (requires NVIDIA runtime)Services:
- API โ http://localhost:8000/docs
- Qdrant UI โ http://localhost:6333/dashboard
- Redis โ localhost:6379 (use
redis-cli)
poe server # Run the FastAPI backend
poe pipeline # Run the end-to-end pipeline (logs steps)
poe test # Run the testsSystem architecture diagram
flowchart TD
subgraph API
A1[GET /search] --> P["run_pipeline()"]
end
subgraph Pipeline
P --> F[Discover papers from ArXiv]
F --> I[Update PaperIndex]
I --> Q[Enqueue papers if not embedded]
Q --> S[Semantic Search]
S --> G[Get related from GraphStore]
G --> M[Merge vector + graph results]
M --> R[Return SearchResults]
end
subgraph Vector Store
V1["Qdrant (hosted/local)"]
end
subgraph Embedding Worker
W1["Reads Redis queue"]
W1 --> E[Embed papers]
E --> V[Upsert to Qdrant]
V --> U[Update PaperIndex status]
end
subgraph Graph Store
G1["(Planned) Neo4j / in-memory graph"]
end
S -->|vector hits| V1
G -->|edges| G1
G1 -->|related| G
Stack highlights:
- Backend: FastAPI + Pydantic + Poetry (with
poethepoettask runner) - Queue: Redis (Upstash or local)
- Vector DB: Qdrant
- Frontend: React + Vite + Tailwind
- Infra: Docker Compose + Fly.io
Thank you to arXiv for use of its open access interoperability.
MIT