Upload PDF documents, ask natural-language questions grounded in their content, and generate concise summaries — all through a local, three-service stack. The React UI talks to a Node.js API gateway, which orchestrates a Python RAG (retrieval-augmented generation) service powered by Hugging Face embeddings and a configurable text-generation model.
- Features
- System Architecture
- Prerequisites
- Installation
- Running the Application
- API Reference
- Configuration
- Project Structure
- Troubleshooting
- Contributors
- Join the Community
- License
| Capability | Description |
|---|---|
| PDF upload | Multipart upload with server-side parsing, chunking, and vector indexing |
| Question answering | Semantic search over document chunks, then local HF model generation |
| Reading Modes | Choose between Standard, Tutor, Socratic, Simple, or Concise answering styles |
| Summarization | Bullet-style summaries from retrieved context |
| Multi-document UI | Upload and switch between multiple PDFs (frontend/) |
| In-browser viewer | Page-by-page PDF preview with react-pdf |
| Chat export | Export conversation history as CSV or plain text |
The application is split into three independently runnable components. Each listens on a dedicated port in development.
flowchart LR
subgraph client ["Browser — :3000"]
UI["React UI\nfrontend/"]
end
subgraph gateway ["API Gateway — :4000"]
API["Express\nserver.js"]
end
subgraph rag ["RAG Service — :5000"]
RAG["FastAPI\nrag-service/main.py"]
FAISS[("FAISS\nin-memory")]
HF["Hugging Face\nembeddings + LLM"]
end
UI -->|"POST /upload, /ask, /summarize"| API
API -->|"POST /process-pdf, /ask, /summarize"| RAG
RAG --> FAISS
RAG --> HF
- Upload — The UI sends a PDF to Express (
POST /upload). Express stores the file temporarily, forwards its path to FastAPI (POST /process-pdf), then deletes the local copy. - Index — FastAPI loads the PDF with LangChain, splits text into chunks, embeds them with
sentence-transformers/all-MiniLM-L6-v2, and stores a FAISS index keyed by a newsession_id. - Ask / Summarize — The UI includes
session_idon each request. FastAPI retrieves relevant chunks, builds a prompt, and runs the configured Hugging Face generation model locally.
Note: Vector stores live in process memory. Restarting the RAG service clears all sessions; users must re-upload PDFs.
Security note: The FastAPI RAG service (
:5000) is meant to be an internal dependency of the Express gateway (:4000). Do not expose it publicly — otherwise attackers can bypass gateway rate limiting by calling RAG endpoints directly.INTERNAL_RAG_TOKENis required so the RAG service rejects requests missingX-Internal-Token.
Existing deployments and local environments must set INTERNAL_RAG_TOKEN before starting the Express API or RAG service. Generate a strong shared secret, put the same value in both environments, and restart both services. The RAG service fails closed when this value is missing.
The Express authentication flow also requires JWT_SECRET for both token signing and verification. Use one strong random value across the auth controller and middleware; do not hardcode or reuse a default secret.
| Service | Folder | Port | URL |
|---|---|---|---|
| React frontend | frontend/ |
3000 | http://localhost:3000 |
| Express API | repository root | 4000 | http://localhost:4000 |
| FastAPI RAG | rag-service/ |
5000 | http://localhost:5000 |
| Tool | Version | Purpose |
|---|---|---|
| Node.js | LTS (18+) recommended | Express gateway and React dev server |
| npm | Bundled with Node.js | JavaScript dependencies |
| Python | 3.10 or newer | RAG service |
| pip | Current | Python dependencies |
Optional but recommended:
- Git — clone and contribute
- CUDA-capable GPU — faster Hugging Face inference (CPU works, slower)
- 8 GB+ RAM — model loading and FAISS indexing
Install dependencies in all three locations. Use three separate terminal sessions when running locally.
cd rag-service
python -m venv venvWindows (PowerShell)
.\venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txtmacOS / Linux
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txtCopy environment configuration from the repository root:
# From rag-service/, after activating the venv
cp ../.env.example .env # macOS / Linux
copy ..\.env.example .env # Windows (cmd)
Copy-Item ..\.env.example .env # Windows (PowerShell)Edit .env if you want a smaller or faster generation model (see Configuration).
cd .. # repository root (parent of rag-service/)
npm installMulter writes uploads to an uploads/ directory at runtime; it is created automatically on first upload.
cd frontend
npm installThe frontend package.json sets "proxy": "http://localhost:4000", so development requests to /upload, /ask, and /summarize are forwarded to Express without CORS configuration in the browser.
Start services in this order so the gateway and RAG layer are ready before you upload a file.
cd rag-service
# activate venv (see Installation)
uvicorn main:app --host 0.0.0.0 --port 5000 --reloadAlternative:
python main.py# repository root
node server.jsExpected log: Backend running on http://localhost:4000
cd frontend
npm startOpen http://localhost:3000 in your browser.
On the first PDF upload or first question, Hugging Face will download:
| Asset | Model ID | Approx. size |
|---|---|---|
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
~90 MB |
| Generation (default) | google/flan-t5-base |
~900 MB |
Downloads are cached under your user Hugging Face cache (e.g. ~/.cache/huggingface on Linux/macOS, %USERPROFILE%\.cache\huggingface on Windows). The first request may take several minutes on a slow connection — this is normal.
You can run the entire multi-service application easily using Docker Compose. This ensures reproducibility and eliminates the need to install Python, Node.js, and dependencies manually on your host machine.
- Clone the repository and navigate to the project root.
- Build and start all services in detached mode:
docker-compose up -d --build- The services will be available at:
- Frontend UI: http://localhost:3000
- Express API Gateway: http://localhost:4000
- FastAPI RAG Service: http://localhost:5000
Note on Initial Startup: On the first run, the RAG service container will download the necessary Hugging Face models (~1GB total). These models are cached in a persistent Docker volume (
pdf-qa-bot-hf-cache) so they will not be re-downloaded on subsequent restarts.
To stop the running containers:
docker-compose downTo stop the containers and remove the cached models/volumes:
docker-compose down -vPublic-facing routes used by the React app. All paths are relative to the gateway origin.
| Method | Endpoint | Content-Type | Request body / form | Success response | Error responses |
|---|---|---|---|---|---|
POST |
/upload |
multipart/form-data |
Field name file (PDF binary) |
200 — { "message": string, "session_id": string } |
400 — no file; 500 — RAG processing failed |
POST |
/ask |
application/json |
{ "question": string, "session_id": string } |
200 — { "answer": string } |
500 — upstream or internal error |
POST |
/summarize |
application/json |
{ "session_id": string, "pdf"?: string } |
200 — { "summary": string } |
500 — upstream or internal error |
Example — upload (curl)
curl -X POST http://localhost:4000/upload \
-F "file=@/path/to/document.pdf"Example — ask
curl -X POST http://localhost:4000/ask \
-H "Content-Type: application/json" \
-d '{"question":"What is the main topic?","session_id":"<uuid-from-upload>"}'Internal service called by Express. You can call it directly for debugging.
| Method | Endpoint | Request body (JSON) | Success response | Notes |
|---|---|---|---|---|
POST |
/process-pdf |
{ "filePath": string } |
{ "message": string, "session_id": string } |
Absolute or relative path to PDF on the machine running FastAPI |
POST |
/ask |
{ "question": string, "session_id": string } |
{ "answer": string } |
Returns a friendly message if session_id is unknown |
POST |
/summarize |
{ "session_id": string, "pdf"?: string | null } |
{ "summary": string } |
pdf is accepted for API compatibility; indexing uses session_id only |
Interactive OpenAPI docs: http://localhost:5000/docs (recommended for local development only; do not expose publicly)
Example — process PDF (via gateway, recommended)
curl -X POST http://localhost:4000/upload \
-F "file=@/path/to/your.pdf"Environment variables are read from the root .env for Express and from rag-service/.env for the RAG service (create both from .env.example at the repo root).
| Variable | Default | Description |
|---|---|---|
RATE_LIMIT_WINDOW_MS |
60000 |
Sliding window for the Express request limiter on /upload, /ask, and /summarize |
RATE_LIMIT_MAX |
60 |
Maximum requests per IP within RATE_LIMIT_WINDOW_MS before a JSON 429 |
UPLOAD_MAX_FILE_SIZE_BYTES |
20000000 |
Maximum PDF size per upload in bytes |
UPLOAD_MAX_CONCURRENT_PER_IP |
2 |
Maximum in-flight /upload requests allowed per IP |
RATE_LIMIT_SLOWDOWN_AFTER |
10 |
Number of free inference requests before the slow-down delay starts |
MAX_UPLOAD_SIZE_MB is still accepted for compatibility, but UPLOAD_MAX_FILE_SIZE_BYTES is preferred.
| Variable | Default | Description |
|---|---|---|
HF_GENERATION_MODEL |
google/flan-t5-base |
Hugging Face model ID for answer/summary generation |
OPENAI_API_KEY |
(empty) | Reserved; not used by the current local HF pipeline |
HOST |
127.0.0.1 |
Documented for optional deployment tuning |
PORT |
5000 |
Documented RAG port (uvicorn CLI flag takes precedence in dev) |
INTERNAL_RAG_TOKEN |
(required) | Shared secret required by protected RAG endpoints. Requests must include the same value in X-Internal-Token |
PDF_PARSE_TIMEOUT_SECONDS |
20 |
Hard timeout for PDF parsing/extraction (mitigates DoS-grade PDFs) |
MAX_PDF_PAGES |
200 |
Reject PDFs with too many pages |
MAX_PDF_EXTRACT_CHARS |
400000 |
Cap extracted text before chunking |
Faster, lighter generation (recommended on CPU-only machines):
HF_GENERATION_MODEL=google/flan-t5-smallOptional frontend override — set before npm start in frontend/:
REACT_APP_API_URL=http://localhost:4000Leave unset to use the CRA dev proxy (/upload → http://localhost:4000/upload).
pdf-qa-bot/
├── .env.example # Environment template (copy to rag-service/.env)
├── .gitignore
├── CONTRIBUTING.md # Contributor workflow
├── README.md # This file
├── package.json # Express dependencies (root)
├── package-lock.json
├── server.js # Express API gateway (:4000)
│
├── frontend/ # Primary React UI (:3000)
│ ├── package.json # proxy → http://localhost:4000
│ ├── public/
│ └── src/
│ ├── App.js # Upload, chat, summarize, PDF viewer, export
│ ├── index.js
│ └── ...
│
├── rag-service/ # FastAPI + LangChain + FAISS (:5000)
│ ├── main.py # RAG endpoints and HF inference
│ ├── requirements.txt
│ └── venv/ # Local Python env (gitignored)
│
├── uploads/ # Temporary PDF storage (created at runtime)
│
└── src/ # Legacy/simple CRA scaffold (not used by default)
└── App.js # Older MUI prototype without session_id support
The frontend/ directory is the supported UI. The root-level src/ tree is a leftover Create React App scaffold and is not wired into the root package.json scripts.
Each service binds a fixed port in development. If another process occupies it, the service fails to start.
| Port | Service | Typical conflict |
|---|---|---|
| 3000 | React (npm start) |
Another CRA app, some API dev tools |
| 4000 | Express (server.js) |
Custom backends, AirPlay on some systems |
| 5000 | FastAPI / uvicorn | Flask defaults, macOS AirPlay Receiver |
Find and free a port (Windows PowerShell)
netstat -ano | findstr :4000
taskkill /PID <pid> /FmacOS / Linux
lsof -i :5000
kill -9 <pid>Workarounds
- Stop the conflicting application, or
- Change the port in code/config consistently across all layers (Express hardcodes
4000andlocalhost:5000inserver.js; FastAPI defaults to5000inmain.py; CRA usesPORT=3001 npm startfor the frontend only if you also update the proxy target orREACT_APP_API_URL).
Always restart RAG → Express → Frontend after port changes.
| Symptom | Likely cause | Fix |
|---|---|---|
| Immediate 500 on upload | RAG service not running | Start uvicorn in rag-service/ first |
ECONNREFUSED in Express logs |
Wrong host/port | Ensure FastAPI is on http://localhost:5000 |
Session expired or invalid in answers |
RAG process restarted | Re-upload the PDF to obtain a new session_id |
| Empty or scanned PDF | No extractable text | Use a text-based PDF, not a pure image scan |
| Cause | What to do |
|---|---|
| First-time Hugging Face model fetch | Wait for completion; verify disk space (~1–2 GB for defaults) |
| Slow or restricted network | Pre-download models (see below) or use flan-t5-small |
| CPU-only inference | Expect slower Q&A; use a smaller HF_GENERATION_MODEL |
| Large PDFs | More chunks → longer embedding and search; try smaller files first |
Pre-download models (optional)
cd rag-service
# activate venv
python -c "from langchain_community.embeddings import HuggingFaceEmbeddings; HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')"
python -c "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM; AutoTokenizer.from_pretrained('google/flan-t5-base'); AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')"Set a custom cache directory if needed:
# macOS / Linux
export HF_HOME=/path/to/large-disk/hf-cache
# Windows PowerShell
$env:HF_HOME = "D:\hf-cache"| Check | Action |
|---|---|
| Express running? | curl http://localhost:4000 may fail (no GET routes) — test with upload or check Terminal 2 logs |
Using REACT_APP_API_URL? |
Must include scheme: http://localhost:4000 |
| CORS errors in browser | Use npm start with default proxy, or ensure Express cors() remains enabled |
| Mixed content | Use http:// locally, not https://, unless you terminate TLS yourself |
| Error | Fix |
|---|---|
ModuleNotFoundError |
Activate rag-service/venv and pip install -r requirements.txt |
torch install fails on Windows |
Install Python 3.10–3.12 x64; use official pytorch.org wheel instructions if needed |
faiss-cpu errors |
Ensure 64-bit Python; reinstall: pip install --force-reinstall faiss-cpu |
- If script execution is blocked:
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned(venv activation). - Use forward slashes or escaped paths in
filePathwhen testing/process-pdfmanually. - Antivirus may slow first model extraction; add cache folder exclusions if appropriate.
Contributions of all kinds are welcome! Check out our CONTRIBUTING.md to get started.
Connect with other contributors, ask questions, and share feedback on Discord:
We’d love to hear from you — whether you’re setting up the project for the first time or shipping your next pull request.
See repository license files and package metadata where applicable. Third-party models are subject to their respective Hugging Face model cards and licenses.
INTERNAL_RAG_TOKEN is required for the FastAPI RAG service. The Node.js
gateway must send the same value in the X-Internal-Token header when calling
protected RAG endpoints such as /process-pdf, /ask, and /summarize.
Protected routes also include /ask/stream and /validate-session-write.
If INTERNAL_RAG_TOKEN is unset or empty, the RAG service fails startup with a
configuration error instead of allowing unauthenticated direct access.