A sophisticated Retrieval Augmented Generation (RAG) chatbot built with Streamlit, LangChain, and OpenAI GPT-4. Features a modern, intuitive interface for document-based conversations.
- Modern UI Design: Clean, centered interface with sparkle animations and suggestion cards
- Multi-PDF Support: Upload and chat with multiple PDF documents simultaneously
- Smart Document Retrieval: FAISS vector search for accurate context retrieval
- Streaming Responses: Real-time GPT-4 responses with typing indicators
- Session Management: Persistent conversation history within sessions
- Responsive Design: Works seamlessly across different screen sizes
- Source Attribution: Responses include page numbers and filenames for verification
app.py: Streamlit frontend with modern UI and chat interfacebrain.py: RAG processing engine (PDF parsing, chunking, vector indexing).env: Environment configuration (OpenAI API key)
- Frontend: Streamlit with custom CSS styling
- Backend: Python with LangChain framework
- Vector Store: FAISS (Facebook AI Similarity Search)
- Embeddings: OpenAI text-embedding models
- LLM: OpenAI GPT-4 with streaming
- PDF Processing: PyPDF for text extraction
- Python 3.8+
- OpenAI API key
-
Clone the repository
git clone https://github.com/krishn1122/AI-Chatbot-RAG.git cd AI-Chatbot-RAG -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create .env file and add your OpenAI API key echo "OPENAI_API_KEY=your_api_key_here" > .env
-
Run the application
streamlit run app.py
- Document Upload: Users upload PDF files through the web interface
- Text Extraction: PyPDF extracts and cleans text content
- Document Chunking: Text is split into 4000-character chunks with metadata
- Vector Embedding: OpenAI creates vector representations of text chunks
- Index Creation: FAISS builds a searchable vector database
- Query Processing: User questions trigger similarity search (top-3 results)
- Context Augmentation: Retrieved chunks are added to the GPT-4 prompt
- Response Generation: GPT-4 generates contextual responses with source attribution
- Landing Page: Modern interface with suggestion cards and PDF upload
- Document Processing: Automatic vector database creation with progress indicators
- Interactive Chat: Real-time conversations with document-aware responses
- Source Tracking: All responses include page numbers and filenames
- "What are the main topics discussed in the uploaded documents?"
- "Summarize the key findings from page 5 of document.pdf"
- "Compare the methodologies mentioned across different documents"
- "What recommendations are provided in the conclusion?"
Create a .env file in the root directory:
OPENAI_API_KEY=your_openai_api_key_here- Chunk Size: Modify
chunk_sizeinbrain.py(default: 4000 characters) - Similarity Search: Adjust
kparameter inapp.py(default: 3 results) - Model Selection: Change GPT model in
app.py(default: gpt-4)
AI-Chatbot-RAG/
├── app.py # Main Streamlit application
├── brain.py # RAG processing logic
├── requirements.txt # Python dependencies
├── .env # Environment variables
├── .gitignore # Git ignore rules
├── README.md # Project documentation
├── thumbnail.webp # Project thumbnail
└── venv/ # Virtual environment
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.