Skip to content

Amogh76/RepoQ-A

Repository files navigation

Repo QA Assistant (Browser-Only)

🚀 A repo-aware AI assistant that can answer questions about your codebase with citations, built to run entirely in the browser — no servers, no API keys, zero cost.


✨ Features

  • Retrieval-Augmented Generation (RAG): combines semantic + exact code search with a lightweight text generator.
  • Embeddings & Search:
  • Generation:
  • Client-Side Deployment:
    • All models run in the browser using Transformers.js with WebGPU/WebAssembly.
    • No backend, no API costs.
  • Memory:
    • Short-term memory of recent turns for continuity.
    • Persistent chat history saved in localStorage.
    • Ability to pin notes (embedded and retrieved like code chunks).
    • Optional rolling summaries of the dialogue.
  • UI:
    • Built with Next.js + React + Tailwind CSS.
    • Chat interface with sources listed per answer.
    • Adjustable Top-K slider.
    • Reset chat, add notes, see retrieval scores.
  • Deployment:
    • Deployed serverlessly on Vercel.
    • First load downloads models (cached in IndexedDB for reuse).

🏗️ Architecture

Repo → Python ingestion (build_index.py) → index.json (chunks + embeddings) | v Next.js App (Vercel) ←→ Transformers.js (browser) | v User Q → Embedder → Cosine Similarity (top-K chunks) → Generator | v Answer + Citations


⚡ How It Works

  1. Preprocessing

    • A Python script (build_index.py) walks through repo files.
    • Splits code/docs into overlapping chunks.
    • Generates embeddings with all-MiniLM-L6-v2.
    • Saves everything into public/index.json.
  2. Retrieval

    • On each query, the question is embedded in-browser.
    • Chunks are scored by cosine similarity.
    • Top-K most relevant chunks (and pinned notes) are selected.
  3. Generation

    • A prompt is built with:
      • The user’s question,
      • Recent chat history,
      • Retrieved context chunks,
      • Guidance for handling subjective questions.
    • LaMini-Flan-T5-248M generates a concise answer.
    • Sources are displayed with file + line spans.
  4. Deployment

    • Everything runs client-side.
    • First model download is heavy (~100–250 MB), but cached after.

🧪 Tech Stack

  • Frontend: Next.js, React, Tailwind CSS
  • AI/ML:
    • Transformers.js
    • Hugging Face Models (MiniLM, LaMini-Flan-T5-248M)
    • Chroma DB (for semantic search, via MCP)
    • Riggrep (for exact code search, via MCP)
  • Ingestion: Python, sentence-transformers
  • Deployment: Vercel (serverless, static hosting)

📚 Skills Learned

  • Retrieval-Augmented Generation (RAG)
  • Embeddings & semantic search
  • Vector databases (Chroma)
  • Exact search (riggrep)
  • Browser-only ML deployment (Transformers.js, WebGPU/WASM)
  • Next.js/React/Tailwind frontend development
  • Python scripting for data preprocessing
  • Serverless deployment (Vercel)
  • Trade-off analysis: cost vs performance vs usability
  • Communicating model constraints in UI/UX

🚧 Limitations

  • Small model (248M params) → short context window (~512–1024 tokens), limited reasoning.
  • First load is slow → large model files (~100–250 MB) must download to the browser.
  • Answers are factual extracts → no deep reasoning across many files.
  • Best for demo / proof-of-concept, not production-level repo QA.

🚀 Try It

Deployed on Vercel:
👉 Live Demo Link

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors