Skip to content

moony01/chatpdf

Repository files navigation

ChatPDF

Chat with any PDF — Upload a PDF and ask questions about its content using Retrieval-Augmented Generation (RAG) with Langchain + ChromaDB + OpenAI.

Live License: MIT Python Langchain

🌐 Live Demo: https://chatpdf-moony.streamlit.app/


Overview

ChatPDF is a Streamlit web app that lets you upload any PDF and have a conversation with it. The app uses Langchain's retrieval pipeline — PDF text extraction, embedding, vector storage in ChromaDB, and answer generation via OpenAI GPT — so responses stay grounded in the actual document content.

Key Features

  • 📄 Drag-Drop Upload — Upload any text-based PDF
  • 🔍 RAG Pipeline — Retrieves only the most relevant chunks before answering
  • 💬 Multi-Turn Chat — Conversation memory within a session
  • 🔒 Session-Scoped — PDFs are not persisted across sessions
  • Streaming Responses — Real-time token output
  • 💸 Cost-Efficient — Selective token usage via retrieval

Tech Stack

Layer Technology
Web Framework Streamlit
PDF Parsing PyPDF
Embeddings OpenAI
Vector Database ChromaDB
Retrieval Orchestration Langchain
LLM OpenAI GPT
Language Python 3.9

How It Works

PDF upload
    ↓
[PyPDF] Extract text → Split into chunks
    ↓
[OpenAI Embeddings] Vectorize chunks
    ↓
[ChromaDB] Store vectors + metadata
    ↓
User asks question
    ↓
[Embed query] → [Cosine similarity search] → Top-k chunks
    ↓
[OpenAI GPT] Answer using retrieved context
    ↓
Streamed response (with cited chunks)

Local Development

Prerequisites

Setup

git clone https://github.com/moony01/chatpdf.git
cd chatpdf

pip install -r requirements.txt

# Add your OpenAI API key
echo "OPENAI_API_KEY=sk-..." > .env

streamlit run main.py

Open the URL printed in the terminal (usually http://localhost:8501).

Project Structure

chatpdf/
├── main.py              # Primary Streamlit app
├── main_streamit.py     # Alternate Streamlit entry
├── requirements.txt     # Python dependencies
├── unsu.pdf             # Sample PDF for testing
└── .devcontainer/       # VSCode dev container config

License

MIT License © 2024–2026 moony01

You are free to use, modify, and distribute this code. Attribution appreciated.

Contact

About

PDF Q&A Chat — Upload any PDF and chat with it using RAG (Langchain + ChromaDB + OpenAI). Streamlit-powered, instant indexing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

 
 
 

Contributors

Languages