Skip to content

AhmedElbashier/ai_rag_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 DocuMind β€” AI Document Hub

A high-end, production-ready RAG SaaS for document management, semantic search, and contextual AI chat.

Next.js Supabase TypeScript Tailwind CSS Stripe Vercel


Overview

DocuMind is an advanced Retrieval-Augmented Generation (RAG) platform built for precision and scale. Upload PDF documents, extract and chunk text with LangChain, generate embeddings with Google Gemini, and store vectors in Supabase pgvector. A dark-themed split-view chat interface streams accurate answers with page citations.

The platform includes a complete SaaS monetization layer with Stripe webhooks, subscription-tier enforcement in Next.js middleware, and a full document management library.


Tech Stack

Layer Technology
Framework Next.js 15 (App Router, Turbopack)
UI Tailwind CSS v4, Framer Motion, shadcn/ui, OKLCH colors
AI Embeddings Google Gemini gemini-embedding-001 (768-dim) via @google/genai
AI Chat Google Gemini gemini-2.5-flash-8b via Vercel AI SDK
Chat Hook @ai-sdk/react useChat (AI SDK v6+)
Vector DB Supabase pgvector with HNSW index (cosine similarity)
File Storage Supabase Storage (auto-created public bucket)
PDF Parsing LangChain WebPDFLoader + pdf-parse@1
Text Chunking LangChain RecursiveCharacterTextSplitter (1000 chars / 200 overlap)
Payments Stripe Checkout + Webhooks

Project Structure

src/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ page.tsx                  # Upload hub + live document library + Recent Insights
β”‚   β”œβ”€β”€ actions.ts                # Server actions: processPDF, fetchDocuments, deleteDocument
β”‚   β”œβ”€β”€ layout.tsx                # Root layout + Sonner toaster
β”‚   β”œβ”€β”€ globals.css               # Tailwind v4 + OKLCH design tokens (dark mode)
β”‚   β”œβ”€β”€ hub/[id]/
β”‚   β”‚   β”œβ”€β”€ page.tsx              # Document chat page (server component)
β”‚   β”‚   └── ChatWorkspace.tsx     # Split-screen: PDF iframe + Gemini chat (client)
β”‚   β”œβ”€β”€ documents/page.tsx        # Full document library grid (server)
β”‚   β”œβ”€β”€ chat/page.tsx             # Document picker β†’ chat workspace (server)
β”‚   β”œβ”€β”€ api-reference/page.tsx    # REST API documentation (mock)
β”‚   β”œβ”€β”€ settings/page.tsx         # Settings: Profile, API Keys, Storage, Security (mock)
β”‚   β”œβ”€β”€ pricing/page.tsx          # Stripe subscription tiers
β”‚   └── api/
β”‚       β”œβ”€β”€ chat/route.ts         # Streaming RAG chat endpoint
β”‚       └── stripe/               # Stripe webhook + checkout handlers
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ TopNav.tsx                # Shared navigation bar (active-route aware)
β”‚   └── ui/                       # shadcn/ui primitives
β”œβ”€β”€ lib/
β”‚   └── supabase.ts               # Supabase anon client + service-role admin client
└── middleware.ts                  # Route guard (protects /api/* except Stripe webhook)

Pages

Route Description
/ Upload zone + live document library + Recent Insights panel
/documents Full document grid with status badges, word count, timestamps
/chat Document picker list β€” click to open chat workspace
/hub/[id] Split-screen: PDF viewer (left, 60%) + Gemini AI chat (right, 40%)
/api-reference Mock REST API docs with expandable endpoints and copy-able curl examples
/settings Mock settings: Profile, API Keys, Storage usage bar, Security toggles, Notifications
/pricing Stripe-powered Free vs Pro subscription page

Database Schema

Run supabase_schema.sql in the Supabase SQL Editor before first use.

-- Required extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Source documents
CREATE TABLE documents (
  id          UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  file_name   TEXT NOT NULL,
  file_url    TEXT NOT NULL,
  content     TEXT NOT NULL,
  created_at  TIMESTAMPTZ DEFAULT timezone('utc', now()) NOT NULL
);

-- Vector chunks
CREATE TABLE embeddings (
  id           UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  document_id  UUID REFERENCES documents(id) ON DELETE CASCADE NOT NULL,
  content      TEXT NOT NULL,
  metadata     JSONB DEFAULT '{}',    -- { loc: { pageNumber: N } }
  embedding    VECTOR(768) NOT NULL,
  created_at   TIMESTAMPTZ DEFAULT timezone('utc', now()) NOT NULL
);

-- HNSW index for fast cosine similarity search
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops);

Also run supabase_schema_stripe.sql if using the Stripe monetization layer.


Environment Variables

Create .env.local from .env.template:

NEXT_PUBLIC_SUPABASE_URL=https://<project>.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...
SUPABASE_SERVICE_ROLE_KEY=eyJ...
GEMINI_API_KEY=AIza...

# Stripe (optional β€” required for /pricing)
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRO_PRICE_ID=price_...
NEXT_PUBLIC_SITE_URL=http://localhost:3000

Note: SUPABASE_SERVICE_ROLE_KEY bypasses Row Level Security. It is used only in server-side actions and API routes β€” never exposed to the client.


Getting Started

npm install
npm run dev

Open http://localhost:3000.

First-time Supabase Setup

  1. Create a project at supabase.com
  2. Enable the vector extension: Database β†’ Extensions β†’ vector
  3. Run supabase_schema.sql in the SQL Editor
  4. Copy your project URL and keys into .env.local
  5. The documents storage bucket is created automatically on first PDF upload

RAG Pipeline

PDF Upload
  β”‚
  β”œβ”€ 1. Auto-create storage bucket if missing (supabaseAdmin)
  β”œβ”€ 2. Upload file β†’ Supabase Storage β†’ get public URL
  β”œβ”€ 3. Parse pages β†’ LangChain WebPDFLoader (page-level metadata)
  β”œβ”€ 4. Chunk text β†’ RecursiveCharacterTextSplitter (1000 chars / 200 overlap)
  β”œβ”€ 5. Embed chunks β†’ Gemini gemini-embedding-001 (768-dim vectors)
  └─ 6. Store vectors β†’ Supabase pgvector embeddings table

Chat Query
  β”‚
  β”œβ”€ 1. Embed user question β†’ Gemini gemini-embedding-001
  β”œβ”€ 2. Cosine similarity search β†’ match_embeddings() RPC (top 5 chunks)
  β”œβ”€ 3. Filter chunks by documentId, build context with page citations
  └─ 4. Stream response β†’ Gemini gemini-2.5-flash-8b (page-cited Markdown)

Chat Page Navigation

The /hub/[id] split-screen chat extracts Page N citations from AI responses and renders them as clickable chips. Clicking a chip updates the PDF iframe URL to #page=N, jumping the viewer to the referenced page. The viewer also auto-jumps on each new AI response.


Important Implementation Notes

  • Embedding model: text-embedding-004 is not available on this API key; gemini-embedding-001 is used instead (same 768-dim output β€” no schema change required)
  • AI SDK v6: useChat moved from ai/react to @ai-sdk/react β€” install @ai-sdk/react separately
  • pdf-parse version: WebPDFLoader requires pdf-parse@^1 (not v2)
  • Server components: Pages using Supabase data are server components; hover effects use CSS classes, not JS event handlers
  • Middleware: All app pages (/, /documents, /chat, /hub/*, /api-reference, /settings) are public. Only raw /api/* routes require authentication (Stripe webhook is exempted)

Deployment (Vercel)

  1. Import repo into Vercel
  2. Add all environment variables from .env.template
  3. Deploy β€” vercel.json sets a 60s function timeout to accommodate AI embedding time

License

MIT License β€” Copyright Β© 2026 Ahmed ELBASHIER. All Rights Reserved.

About

High-performance Retrieval-Augmented Generation (RAG) engine for enterprise contact centers. Built with TypeScript and LangChain to provide context-aware, hallucination-free AI responses using multi-tenant vector indexing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors