Quran/Hadith question answering with retrieval-augmented generation (RAG).
- Backend: Flask API using FAISS + sentence-transformers, with optional TF‑IDF and keyword fusion.
- Frontend: HTML/CSS/JS chat UI with dark mode, font-size controls, copy-to-clipboard, and source expansion.
- Ensemble retrieval combining FAISS, sentence-transformers, TF‑IDF, and keyword overlap with dynamic per-query weighting.
- Transparent answers showing matched sources and chunks used.
- Clean UI with dark theme toggle, readability controls, toasts, and copy button.
backend/: Flask app and retrieval logic (app.py,rag_utils_inference.py,requirements.txt).frontend/: UI (index.html,static/css/style.css,static/js/app.js).scripts/: Utility scripts (e.g.,process_quran_chunks.py).report.txt: Project summary for submission..gitignore: Excludes venv, caches, large generated artifacts, and secrets.
- Python 3.10+ recommended.
- Windows PowerShell 5.1 (commands below use PowerShell syntax).
- Packages listed in
backend/requirements.txt(Flask, faiss-cpu, torch, sentence-transformers, scikit-learn, numpy, transformers).
# From repo root
python -m venv .venv
.\.venv\Scripts\Activate.ps1
# Install backend dependencies
pip install -r backend\requirements.txtOptional environment variables:
OPENAI_API_KEY(if enabling external LLMs in backend; not required for local retrieval).- Create
backend\.envif needed; do not commit it.
- The repo does not include large data/embeddings.
- Expected local paths (ignored by
.gitignoreif generated):backend\data\for chunked Quran/Hadith JSONL.backend\embeddings\andbackend\fine_tuned_embeddings\for.npyvectors.- FAISS index files (e.g.,
.bin) stored underbackend\.
- You can regenerate embeddings/indices using scripts under
backend/if required.
# Start the backend
Set-Location backend
python app.pyFrontend:
- Open
frontend/index.htmldirectly or serve via a simple static server. - Ensure
app.jspoints tohttp://127.0.0.1:5000for the backend.
GET /api/health: Health check.GET /api/stats: Basic stats (counts, model info).POST /api/ask: Body{ "query": "...", "strategy": "auto" }strategy:faiss|st|tfidf|keywords|auto(default).- Returns
{ answer, sources: [{text, meta, score}], debug }.
- Run backend from the
backenddirectory to avoid relative path issues. - Large artifacts (
.npy,.bin, dataset folders) and secrets are ignored by.gitignore. - For a quick demo, add small sample chunks under
backend\data\locally.
- Hadith: data sourced via public Hadith API projects such as
https://github.com/swmohammad/hadith-apiand similar community-maintained resources. - Quran: text/metadata from public Quran JSON repositories such as
https://github.com/semarketir/quranjson.