GitHub - 051821/BrainPain-Generater

BrainPain Generator 😭→🧠

A savage MCQ generator powered by RAG + Streamlit + Ollama.

This project is a local, offline, Retrieval-Augmented Generation (RAG) quiz generator. You upload any PDF, the system:

Reads the document
Splits it into chunks
Builds a FAISS vector index
Performs Hybrid Search
Summarizes the retrieved context
Generates MCQs using LLaMA / DeepSeek via Ollama
Runs a toxic, fun MCQ quiz
Gives one-line explanations per question

Features ✅ 1. PDF → Chunking → Embeddings → FAISS Index

The PDF is split using Recursive Character Text Splitter Each chunk is embedded using nomic-embed-text All embeddings stored in a FAISS vector store.

✅ 2. Hybrid Search (Vector + Keyword)

Your topic is retrieved using:

Vector similarity search

Exact keyword scan

Combining both improves relevance and prevents "LLM hallucination".

✅ 3. Context Summarization

Retrieved chunks are compressed via:

LLM-driven summarization

Produces compact bullet points

Prevents over-token usage

Ensures MCQs are strictly grounded in context

✅ 4. MCQ Generation (RAG Prompting)

The LLM generates high-quality MCQs with strict formatting

The app includes: ✅ Retry mechanism if MCQs = 0 ✅ Expanded context fallback ✅ Strict parsing engine

✅ 5. One-shot Explanation Generator

The app generates explanations for all MCQs in one LLM call, not per question. This reduces cost, time, and ensures consistent format.

✅ 6. funny UI

Every button is savage. Every message roasts the user. The theme is intentionally brutal for fun.

🧠 Algorithms & Concepts Used

Below is a detailed breakdown of every algorithm and technique implemented behind the scenes.

Recursive Character Text Splitting (Text Preprocessing)

Algorithm: RecursiveCharacterTextSplitter

Used to divide the PDF content into overlapping chunks.

Why?

Prevents context loss

Keeps semantic boundaries intact

Ensures embedding quality

How it works:

Splits text by large separators first (\n\n), then by sentences, then words

If still too large, recursively breaks content

Adds overlap to maintain continuity

Embedding Generation (Semantic Vectorization)

Algorithm: nomic-embed-text embeddings model (2.7B) Framework: LangChain + Ollama

Purpose:

Convert text chunks → numeric vector representations

Capture semantic meaning

Enable similarity search

Used to create dense vectors that FAISS can index.

FAISS – Facebook AI Similarity Search

Algorithm Type: Approximate Nearest Neighbor (ANN) Index Type: Flat L2 index

Why FAISS?

Fast vector search

Scales to thousands of chunks

Runs locally

Used to retrieve the most relevant text chunks for your query.

Hybrid Search (Vector + Keyword Matching)

Components:

Vector Similarity Search

Exact Substring Keyword Matching

Why?

Vector search handles semantic meaning

Keyword search handles exact matches

Combined → maximum precision & coverage

This improves RAG accuracy and reduces hallucinations.

LLM Context Summarization

Algorithm: LLM-driven abstractive summarization

Key decisions:

Compress top-k retrieved chunks

Keep only essential bullet points

Remove filler text

Reduce token usage

Why Summarize?

Keeps MCQ generation focused

Ensures strict grounding in PDF

Fits inside LLM context window

MCQ Generation (Constrained Prompting)

Algorithm Type:

Prompt engineering

Constraint-based text generation

Rules enforced:

No hallucination

Only use PDF content

Strict formatting

4 options

One correct answer

Fallback logic:

If MCQs = 0 → retry with doubled context

If still = 0 → return error

MCQ Parsing Algorithm

Algorithm:

Regex-free, rule-based line parsing

Whitespace tolerant

Formatting variations handled

This converts LLM output → structured Python objects.

One-shot Explanation Generator

Algorithm:

Batch-explanation LLM prompt

Strict per-question format

One output line per explanation

Auto-repair if format breaks

Benefits:

Faster than per-question LLM calls

More consistent explanations

Efficient token usage

🚀 How to Run

Install Ollama

https://ollama.com/download

Then pull required models:

ollama pull llama2 ollama pull nomic-embed-text

Or for DeepSeek:

ollama pull deepseek-r1:7b

Install requirements pip install -r requirements.txt
Run Streamlit streamlit run app.py

✅ Tech Stack

Python

Streamlit

LangChain

Ollama

PyMuPDF

FAISS

nomic-embed-text

LLaMA / DeepSeek models

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages