Skip to content

ASHWINI12S/RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Here's a well-formatted and comprehensive README.md for your RAG (Retrieval-Augmented Generation) PDF Question-Answering app using Streamlit and LangChain:


# ๐Ÿ“„ RAG-based PDF Question Answering App

This is a Streamlit web application that uses **Retrieval-Augmented Generation (RAG)** with LangChain to answer user queries based on the contents of a PDF document. The application uses HuggingFace sentence embeddings, FAISS vector store, and Groqโ€™s `gemma2-9b-it` model for generating responses.

---

## ๐Ÿš€ Features

- Upload and read content from a PDF file.
- Split the PDF content into manageable text chunks.
- Convert text chunks into vector embeddings using HuggingFace Transformers.
- Store and search vector embeddings with FAISS.
- Use Groq's `gemma2-9b-it` model for answering questions.
- Ask natural language questions based on PDF content.
- Simple web interface built with Streamlit.

---

## ๐Ÿ› ๏ธ Tech Stack

| Tool            | Purpose                                  |
|-----------------|------------------------------------------|
| Streamlit       | Web app UI                               |
| LangChain       | RAG pipeline                             |
| HuggingFace     | Sentence embedding model                 |
| FAISS           | Vector store for semantic search         |
| PyPDF2          | PDF text extraction                      |
| Groq API        | LLM backend for answer generation        |

---

## ๐Ÿ“‚ Project Structure

```bash
rag-pdf-app/
โ”‚
โ”œโ”€โ”€ app.py               # Main Streamlit application
โ”œโ”€โ”€ Cheenai_LTT.pdf      # Sample PDF (optional)
โ””โ”€โ”€ README.md            # Project documentation

โš™๏ธ Setup Instructions

1. Clone the Repository

git clone https://github.com/your-username/rag-pdf-app.git
cd rag-pdf-app

2. Create & Activate Virtual Environment

python -m venv venv
.\venv\Scripts\activate  # On Windows

3. Install Dependencies

pip install -r requirements.txt

Sample requirements.txt:

streamlit
langchain
langchain-community
PyPDF2
faiss-cpu
sentence-transformers

4. Add Your Groq API Key

Replace this line in app.py with your own key:

groqapi = 'your_groq_api_key'

๐Ÿง  How It Works

  1. PDF Upload: Reads content from a local PDF using PyPDF2.
  2. Text Splitting: Splits content into chunks using LangChainโ€™s RecursiveCharacterTextSplitter.
  3. Embedding Generation: Uses HuggingFace's all-MiniLM-L6-v2 model to embed chunks.
  4. Vector Store: Chunks are stored in a FAISS index.
  5. Retriever: Fetches the most relevant chunks based on user queries.
  6. RAG Prompting: Combines retrieved context with the user question and prompts Groqโ€™s LLM.
  7. Answer Display: Outputs the generated response in Streamlit.

๐Ÿ–ผ๏ธ App Preview

App Screenshot


๐Ÿ“Œ Example Usage

  1. Launch the app:
streamlit run app.py
  1. The app will:

    • Automatically load the PDF.
    • Display success messages when processing is complete.
    • Prompt you to ask a question.
    • Return a helpful answer based only on the content of the PDF.

โ“ FAQ

Q: Can I use another PDF? Yes! Modify the uploaded_file path in the code to use any local PDF.

Q: Do I need GPU or heavy compute? No, the heavy lifting is done by Groqโ€™s cloud-hosted model.

Q: Is it secure? Keep your groqapi private. Never share your key publicly.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages