PDF Q&A Bot

Upload PDF documents, ask natural-language questions grounded in their content, and generate concise summaries — all through a local, three-service stack. The React UI talks to a Node.js API gateway, which orchestrates a Python RAG (retrieval-augmented generation) service powered by Hugging Face embeddings and a configurable text-generation model.

Features

Capability	Description
PDF upload	Multipart upload with server-side parsing, chunking, and vector indexing
Question answering	Semantic search over document chunks, then local HF model generation
Reading Modes	Choose between Standard, Tutor, Socratic, Simple, or Concise answering styles
Summarization	Bullet-style summaries from retrieved context
Multi-document UI	Upload and switch between multiple PDFs (`frontend/`)
In-browser viewer	Page-by-page PDF preview with `react-pdf`
Chat export	Export conversation history as CSV or plain text

System Architecture

The application is split into three independently runnable components. Each listens on a dedicated port in development.

flowchart LR
  subgraph client ["Browser — :3000"]
    UI["React UI\nfrontend/"]
  end

  subgraph gateway ["API Gateway — :4000"]
    API["Express\nserver.js"]
  end

  subgraph rag ["RAG Service — :5000"]
    RAG["FastAPI\nrag-service/main.py"]
    FAISS[("FAISS\nin-memory")]
    HF["Hugging Face\nembeddings + LLM"]
  end

  UI -->|"POST /upload, /ask, /summarize"| API
  API -->|"POST /process-pdf, /ask, /summarize"| RAG
  RAG --> FAISS
  RAG --> HF

Request lifecycle

Upload — The UI sends a PDF to Express (POST /upload). Express stores the file temporarily, forwards its path to FastAPI (POST /process-pdf), then deletes the local copy.
Index — FastAPI loads the PDF with LangChain, splits text into chunks, embeds them with sentence-transformers/all-MiniLM-L6-v2, and stores a FAISS index keyed by a new session_id.
Ask / Summarize — The UI includes session_id on each request. FastAPI retrieves relevant chunks, builds a prompt, and runs the configured Hugging Face generation model locally.

Note: Vector stores live in process memory. Restarting the RAG service clears all sessions; users must re-upload PDFs.

Security note: The FastAPI RAG service (:5000) is meant to be an internal dependency of the Express gateway (:4000). Do not expose it publicly — otherwise attackers can bypass gateway rate limiting by calling RAG endpoints directly. INTERNAL_RAG_TOKEN is required so the RAG service rejects requests missing X-Internal-Token.

Upgrade Notes

Existing deployments and local environments must set INTERNAL_RAG_TOKEN before starting the Express API or RAG service. Generate a strong shared secret, put the same value in both environments, and restart both services. The RAG service fails closed when this value is missing.

The Express authentication flow also requires JWT_SECRET for both token signing and verification. Use one strong random value across the auth controller and middleware; do not hardcode or reuse a default secret.

Default ports

Service	Folder	Port	URL
React frontend	`frontend/`	3000	http://localhost:3000
Express API	repository root	4000	http://localhost:4000
FastAPI RAG	`rag-service/`	5000	http://localhost:5000

Prerequisites

Tool	Version	Purpose
Node.js	LTS (18+) recommended	Express gateway and React dev server
npm	Bundled with Node.js	JavaScript dependencies
Python	3.10 or newer	RAG service
pip	Current	Python dependencies

Optional but recommended:

Git — clone and contribute
CUDA-capable GPU — faster Hugging Face inference (CPU works, slower)
8 GB+ RAM — model loading and FAISS indexing

Installation

Install dependencies in all three locations. Use three separate terminal sessions when running locally.

1. RAG service (`rag-service/`)

cd rag-service
python -m venv venv

Windows (PowerShell)

.\venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt

macOS / Linux

source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Copy environment configuration from the repository root:

# From rag-service/, after activating the venv
cp ../.env.example .env    # macOS / Linux
copy ..\.env.example .env  # Windows (cmd)
Copy-Item ..\.env.example .env  # Windows (PowerShell)

Edit .env if you want a smaller or faster generation model (see Configuration).

2. Express API (repository root)

cd ..          # repository root (parent of rag-service/)
npm install

Multer writes uploads to an uploads/ directory at runtime; it is created automatically on first upload.

3. React frontend (`frontend/`)

cd frontend
npm install

The frontend package.json sets "proxy": "http://localhost:4000", so development requests to /upload, /ask, and /summarize are forwarded to Express without CORS configuration in the browser.

Running the Application

Start services in this order so the gateway and RAG layer are ready before you upload a file.

Terminal 1 — RAG service

cd rag-service
# activate venv (see Installation)
uvicorn main:app --host 0.0.0.0 --port 5000 --reload

Alternative:

python main.py

Terminal 2 — Express API

# repository root
node server.js

Expected log: Backend running on http://localhost:4000

Terminal 3 — Frontend

cd frontend
npm start

Open http://localhost:3000 in your browser.

First-run model download

On the first PDF upload or first question, Hugging Face will download:

Asset	Model ID	Approx. size
Embeddings	`sentence-transformers/all-MiniLM-L6-v2`	~90 MB
Generation (default)	`google/flan-t5-base`	~900 MB

Downloads are cached under your user Hugging Face cache (e.g. ~/.cache/huggingface on Linux/macOS, %USERPROFILE%\.cache\huggingface on Windows). The first request may take several minutes on a slow connection — this is normal.

Running with Docker

You can run the entire multi-service application easily using Docker Compose. This ensures reproducibility and eliminates the need to install Python, Node.js, and dependencies manually on your host machine.

Prerequisites

Quick Start

Clone the repository and navigate to the project root.
Build and start all services in detached mode:

docker-compose up -d --build

The services will be available at:
- Frontend UI: http://localhost:3000
- Express API Gateway: http://localhost:4000
- FastAPI RAG Service: http://localhost:5000

Note on Initial Startup: On the first run, the RAG service container will download the necessary Hugging Face models (~1GB total). These models are cached in a persistent Docker volume (pdf-qa-bot-hf-cache) so they will not be re-downloaded on subsequent restarts.

Stopping the services

To stop the running containers:

docker-compose down

To stop the containers and remove the cached models/volumes:

docker-compose down -v

API Reference

Express API (`http://localhost:4000`)

Public-facing routes used by the React app. All paths are relative to the gateway origin.

Method	Endpoint	Content-Type	Request body / form	Success response	Error responses
`POST`	`/upload`	`multipart/form-data`	Field name `file` (PDF binary)	`200` — `{ "message": string, "session_id": string }`	`400` — no file; `500` — RAG processing failed
`POST`	`/ask`	`application/json`	`{ "question": string, "session_id": string }`	`200` — `{ "answer": string }`	`500` — upstream or internal error
`POST`	`/summarize`	`application/json`	`{ "session_id": string, "pdf"?: string }`	`200` — `{ "summary": string }`	`500` — upstream or internal error

Example — upload (curl)

curl -X POST http://localhost:4000/upload \
  -F "file=@/path/to/document.pdf"

Example — ask

curl -X POST http://localhost:4000/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"What is the main topic?","session_id":"<uuid-from-upload>"}'

FastAPI RAG service (`http://localhost:5000`)

Internal service called by Express. You can call it directly for debugging.

Method	Endpoint	Request body (JSON)	Success response	Notes
`POST`	`/process-pdf`	`{ "filePath": string }`	`{ "message": string, "session_id": string }`	Absolute or relative path to PDF on the machine running FastAPI
`POST`	`/ask`	`{ "question": string, "session_id": string }`	`{ "answer": string }`	Returns a friendly message if `session_id` is unknown
`POST`	`/summarize`	`{ "session_id": string, "pdf"?: string \| null }`	`{ "summary": string }`	`pdf` is accepted for API compatibility; indexing uses `session_id` only

Interactive OpenAPI docs: http://localhost:5000/docs (recommended for local development only; do not expose publicly)

Example — process PDF (via gateway, recommended)

curl -X POST http://localhost:4000/upload \
  -F "file=@/path/to/your.pdf"

Configuration

Environment variables are read from the root .env for Express and from rag-service/.env for the RAG service (create both from .env.example at the repo root).

Express gateway security

Variable	Default	Description
`RATE_LIMIT_WINDOW_MS`	`60000`	Sliding window for the Express request limiter on `/upload`, `/ask`, and `/summarize`
`RATE_LIMIT_MAX`	`60`	Maximum requests per IP within `RATE_LIMIT_WINDOW_MS` before a JSON `429`
`UPLOAD_MAX_FILE_SIZE_BYTES`	`20000000`	Maximum PDF size per upload in bytes
`UPLOAD_MAX_CONCURRENT_PER_IP`	`2`	Maximum in-flight `/upload` requests allowed per IP
`RATE_LIMIT_SLOWDOWN_AFTER`	`10`	Number of free inference requests before the slow-down delay starts

MAX_UPLOAD_SIZE_MB is still accepted for compatibility, but UPLOAD_MAX_FILE_SIZE_BYTES is preferred.

Variable	Default	Description
`HF_GENERATION_MODEL`	`google/flan-t5-base`	Hugging Face model ID for answer/summary generation
`OPENAI_API_KEY`	(empty)	Reserved; not used by the current local HF pipeline
`HOST`	`127.0.0.1`	Documented for optional deployment tuning
`PORT`	`5000`	Documented RAG port (uvicorn CLI flag takes precedence in dev)
`INTERNAL_RAG_TOKEN`	(required)	Shared secret required by protected RAG endpoints. Requests must include the same value in `X-Internal-Token`
`PDF_PARSE_TIMEOUT_SECONDS`	`20`	Hard timeout for PDF parsing/extraction (mitigates DoS-grade PDFs)
`MAX_PDF_PAGES`	`200`	Reject PDFs with too many pages
`MAX_PDF_EXTRACT_CHARS`	`400000`	Cap extracted text before chunking

Faster, lighter generation (recommended on CPU-only machines):

HF_GENERATION_MODEL=google/flan-t5-small

Optional frontend override — set before npm start in frontend/:

REACT_APP_API_URL=http://localhost:4000

Leave unset to use the CRA dev proxy (/upload → http://localhost:4000/upload).

Project Structure

pdf-qa-bot/
├── .env.example              # Environment template (copy to rag-service/.env)
├── .gitignore
├── CONTRIBUTING.md           # Contributor workflow
├── README.md                 # This file
├── package.json              # Express dependencies (root)
├── package-lock.json
├── server.js                 # Express API gateway (:4000)
│
├── frontend/                 # Primary React UI (:3000)
│   ├── package.json          # proxy → http://localhost:4000
│   ├── public/
│   └── src/
│       ├── App.js            # Upload, chat, summarize, PDF viewer, export
│       ├── index.js
│       └── ...
│
├── rag-service/              # FastAPI + LangChain + FAISS (:5000)
│   ├── main.py               # RAG endpoints and HF inference
│   ├── requirements.txt
│   └── venv/                 # Local Python env (gitignored)
│
├── uploads/                  # Temporary PDF storage (created at runtime)
│
└── src/                      # Legacy/simple CRA scaffold (not used by default)
    └── App.js                # Older MUI prototype without session_id support

The frontend/ directory is the supported UI. The root-level src/ tree is a leftover Create React App scaffold and is not wired into the root package.json scripts.

Troubleshooting

Port already in use (`EADDRINUSE`)

Each service binds a fixed port in development. If another process occupies it, the service fails to start.

Port	Service	Typical conflict
3000	React (`npm start`)	Another CRA app, some API dev tools
4000	Express (`server.js`)	Custom backends, AirPlay on some systems
5000	FastAPI / uvicorn	Flask defaults, macOS AirPlay Receiver

Find and free a port (Windows PowerShell)

netstat -ano | findstr :4000
taskkill /PID <pid> /F

macOS / Linux

lsof -i :5000
kill -9 <pid>

Workarounds

Stop the conflicting application, or
Change the port in code/config consistently across all layers (Express hardcodes 4000 and localhost:5000 in server.js; FastAPI defaults to 5000 in main.py; CRA uses PORT=3001 npm start for the frontend only if you also update the proxy target or REACT_APP_API_URL).

Always restart RAG → Express → Frontend after port changes.

Upload or ask returns `500` / “PDF processing failed”

Symptom	Likely cause	Fix
Immediate 500 on upload	RAG service not running	Start `uvicorn` in `rag-service/` first
`ECONNREFUSED` in Express logs	Wrong host/port	Ensure FastAPI is on `http://localhost:5000`
`Session expired or invalid` in answers	RAG process restarted	Re-upload the PDF to obtain a new `session_id`
Empty or scanned PDF	No extractable text	Use a text-based PDF, not a pure image scan

Slow first request / long “Downloading…” pauses

Cause	What to do
First-time Hugging Face model fetch	Wait for completion; verify disk space (~1–2 GB for defaults)
Slow or restricted network	Pre-download models (see below) or use `flan-t5-small`
CPU-only inference	Expect slower Q&A; use a smaller `HF_GENERATION_MODEL`
Large PDFs	More chunks → longer embedding and search; try smaller files first

Pre-download models (optional)

cd rag-service
# activate venv
python -c "from langchain_community.embeddings import HuggingFaceEmbeddings; HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')"
python -c "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM; AutoTokenizer.from_pretrained('google/flan-t5-base'); AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')"

Set a custom cache directory if needed:

# macOS / Linux
export HF_HOME=/path/to/large-disk/hf-cache

# Windows PowerShell
$env:HF_HOME = "D:\hf-cache"

Frontend cannot reach the API

Check	Action
Express running?	`curl http://localhost:4000` may fail (no GET routes) — test with upload or check Terminal 2 logs
Using `REACT_APP_API_URL`?	Must include scheme: `http://localhost:4000`
CORS errors in browser	Use `npm start` with default proxy, or ensure Express `cors()` remains enabled
Mixed content	Use `http://` locally, not `https://`, unless you terminate TLS yourself

Python / dependency issues

Error	Fix
`ModuleNotFoundError`	Activate `rag-service/venv` and `pip install -r requirements.txt`
`torch` install fails on Windows	Install Python 3.10–3.12 x64; use official pytorch.org wheel instructions if needed
`faiss-cpu` errors	Ensure 64-bit Python; reinstall: `pip install --force-reinstall faiss-cpu`

Windows-specific notes

If script execution is blocked: Set-ExecutionPolicy -Scope CurrentUser RemoteSigned (venv activation).
Use forward slashes or escaped paths in filePath when testing /process-pdf manually.
Antivirus may slow first model extraction; add cache folder exclusions if appropriate.

🤝 Contributors

Contributions of all kinds are welcome! Check out our CONTRIBUTING.md to get started.

🚀 Join the Community

Connect with other contributors, ask questions, and share feedback on Discord:

Join the pdf-qa-bot Discord →

We’d love to hear from you — whether you’re setting up the project for the first time or shipping your next pull request.

License

See repository license files and package metadata where applicable. Third-party models are subject to their respective Hugging Face model cards and licenses.

RAG internal authentication

INTERNAL_RAG_TOKEN is required for the FastAPI RAG service. The Node.js gateway must send the same value in the X-Internal-Token header when calling protected RAG endpoints such as /process-pdf, /ask, and /summarize. Protected routes also include /ask/stream and /validate-session-write.

If INTERNAL_RAG_TOKEN is unset or empty, the RAG service fails startup with a configuration error instead of allowing unauthenticated direct access.

Name		Name	Last commit message	Last commit date
Latest commit History 522 Commits
.github		.github
frontend		frontend
public		public
rag-service		rag-service
security		security
src		src
validators		validators
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
logo.svg		logo.svg
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
server.test.js		server.test.js

Folders and files

Latest commit

History

Repository files navigation

PDF Q&A Bot

Table of Contents

Features

System Architecture

Request lifecycle

Upgrade Notes

Default ports

Prerequisites

Installation

1. RAG service (rag-service/)

2. Express API (repository root)

3. React frontend (frontend/)

Running the Application

Terminal 1 — RAG service

Terminal 2 — Express API

Terminal 3 — Frontend

First-run model download

Running with Docker

Prerequisites

Quick Start

Stopping the services

API Reference

Express API (http://localhost:4000)

FastAPI RAG service (http://localhost:5000)

Configuration

Express gateway security

Project Structure

Troubleshooting

Port already in use (EADDRINUSE)

Upload or ask returns 500 / “PDF processing failed”

Slow first request / long “Downloading…” pauses

Frontend cannot reach the API

Python / dependency issues

Windows-specific notes

🤝 Contributors

🚀 Join the Community

License

RAG internal authentication

About

Topics

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. RAG service (`rag-service/`)

3. React frontend (`frontend/`)

Express API (`http://localhost:4000`)

FastAPI RAG service (`http://localhost:5000`)

Port already in use (`EADDRINUSE`)

Upload or ask returns `500` / “PDF processing failed”

Packages