BrainPain Generator 😭→🧠
A savage MCQ generator powered by RAG + Streamlit + Ollama.
This project is a local, offline, Retrieval-Augmented Generation (RAG) quiz generator. You upload any PDF, the system:
- Reads the document
- Splits it into chunks
- Builds a FAISS vector index
- Performs Hybrid Search
- Summarizes the retrieved context
- Generates MCQs using LLaMA / DeepSeek via Ollama
- Runs a toxic, fun MCQ quiz
- Gives one-line explanations per question
Features ✅ 1. PDF → Chunking → Embeddings → FAISS Index
The PDF is split using Recursive Character Text Splitter Each chunk is embedded using nomic-embed-text All embeddings stored in a FAISS vector store.
✅ 2. Hybrid Search (Vector + Keyword)
Your topic is retrieved using:
Vector similarity search
Exact keyword scan
Combining both improves relevance and prevents "LLM hallucination".
✅ 3. Context Summarization
Retrieved chunks are compressed via:
LLM-driven summarization
Produces compact bullet points
Prevents over-token usage
Ensures MCQs are strictly grounded in context
✅ 4. MCQ Generation (RAG Prompting)
The LLM generates high-quality MCQs with strict formatting
The app includes: ✅ Retry mechanism if MCQs = 0 ✅ Expanded context fallback ✅ Strict parsing engine
✅ 5. One-shot Explanation Generator
The app generates explanations for all MCQs in one LLM call, not per question. This reduces cost, time, and ensures consistent format.
✅ 6. funny UI
Every button is savage. Every message roasts the user. The theme is intentionally brutal for fun.
🧠 Algorithms & Concepts Used
Below is a detailed breakdown of every algorithm and technique implemented behind the scenes.
- Recursive Character Text Splitting (Text Preprocessing)
Algorithm: RecursiveCharacterTextSplitter
Used to divide the PDF content into overlapping chunks.
Why?
Prevents context loss
Keeps semantic boundaries intact
Ensures embedding quality
How it works:
Splits text by large separators first (\n\n), then by sentences, then words
If still too large, recursively breaks content
Adds overlap to maintain continuity
- Embedding Generation (Semantic Vectorization)
Algorithm: nomic-embed-text embeddings model (2.7B) Framework: LangChain + Ollama
Purpose:
Convert text chunks → numeric vector representations
Capture semantic meaning
Enable similarity search
Used to create dense vectors that FAISS can index.
- FAISS – Facebook AI Similarity Search
Algorithm Type: Approximate Nearest Neighbor (ANN) Index Type: Flat L2 index
Why FAISS?
Fast vector search
Scales to thousands of chunks
Runs locally
Used to retrieve the most relevant text chunks for your query.
- Hybrid Search (Vector + Keyword Matching)
Components:
Vector Similarity Search
Exact Substring Keyword Matching
Why?
Vector search handles semantic meaning
Keyword search handles exact matches
Combined → maximum precision & coverage
This improves RAG accuracy and reduces hallucinations.
- LLM Context Summarization
Algorithm: LLM-driven abstractive summarization
Key decisions:
Compress top-k retrieved chunks
Keep only essential bullet points
Remove filler text
Reduce token usage
Why Summarize?
Keeps MCQ generation focused
Ensures strict grounding in PDF
Fits inside LLM context window
- MCQ Generation (Constrained Prompting)
Algorithm Type:
Prompt engineering
Constraint-based text generation
Rules enforced:
No hallucination
Only use PDF content
Strict formatting
4 options
One correct answer
Fallback logic:
If MCQs = 0 → retry with doubled context
If still = 0 → return error
- MCQ Parsing Algorithm
Algorithm:
Regex-free, rule-based line parsing
Whitespace tolerant
Formatting variations handled
This converts LLM output → structured Python objects.
- One-shot Explanation Generator
Algorithm:
Batch-explanation LLM prompt
Strict per-question format
One output line per explanation
Auto-repair if format breaks
Benefits:
Faster than per-question LLM calls
More consistent explanations
Efficient token usage
🚀 How to Run
- Install Ollama
Then pull required models:
ollama pull llama2 ollama pull nomic-embed-text
Or for DeepSeek:
ollama pull deepseek-r1:7b
-
Install requirements pip install -r requirements.txt
-
Run Streamlit streamlit run app.py
✅ Tech Stack
Python
Streamlit
LangChain
Ollama
PyMuPDF
FAISS
nomic-embed-text
LLaMA / DeepSeek models