Skip to content

IMRANDIL/local-rag-system-phi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Local RAG System with ChromaDB + Ollama

A fully local Retrieval-Augmented Generation (RAG) system built using:

  • Ollama
  • Phi3
  • Nomic Embeddings
  • ChromaDB
  • Streamlit

This project demonstrates the core architecture behind modern AI systems:

  • Local LLM inference
  • Embedding generation
  • Semantic search
  • Vector databases
  • Retrieval pipelines
  • External memory systems
  • Context-augmented generation

🚀 Features

  • Fully local AI stack
  • Semantic document retrieval
  • Vector similarity search
  • Local embedding generation
  • Retrieval-Augmented Generation (RAG)
  • Streamlit-based UI
  • ChromaDB vector storage
  • Ollama local inference runtime

🧠 Architecture

Documents
↓
Embedding Model (nomic-embed-text)
↓
Vector Embeddings
↓
ChromaDB Vector Store
↓
User Query
↓
Query Embedding
↓
Semantic Similarity Search
↓
Retrieved Context
↓
Phi3 LLM
↓
Generated Response

📦 Tech Stack

Component Technology
LLM Runtime Ollama
Generation Model phi3
Embedding Model nomic-embed-text
Vector Database ChromaDB
Frontend Streamlit
Language Python

📁 Project Structure

local-rag-system/
│
├── app.py
├── knowledge.txt
├── requirements.txt
├── README.md
└── .gitignore

⚙️ Installation

1. Clone Repository

git clone https://github.com/IMRANDIL/local-rag-system-phi
cd local-rag-system

2. Create Virtual Environment

Windows

python -m venv venv
venv\Scripts\activate

Git Bash

source venv/Scripts/activate

3. Install Dependencies

pip install -r requirements.txt

🤖 Install Ollama Models

Pull Phi3

ollama pull phi3

Pull Embedding Model

ollama pull nomic-embed-text

▶️ Run Application

streamlit run app.py

Application runs on:

http://localhost:8501

🧩 How It Works

Step 1 — Document Ingestion

Documents from knowledge.txt are loaded and converted into embeddings using:

nomic-embed-text

Step 2 — Vector Storage

Embeddings are stored in ChromaDB for semantic retrieval.


Step 3 — Query Embedding

User question is converted into embedding vector.


Step 4 — Semantic Retrieval

ChromaDB performs vector similarity search to retrieve the most relevant context.


Step 5 — Context Injection

Retrieved documents are injected into the final prompt.


Step 6 — Local LLM Generation

Phi3 generates the final response using retrieved context.


🧠 Core AI Concepts Demonstrated

This project demonstrates:

  • Transformer inference
  • Embeddings
  • Semantic search
  • Vector similarity
  • Retrieval-Augmented Generation
  • External memory systems
  • Local AI infrastructure
  • Vector databases
  • Context injection

📌 Example Questions

What is semantic search?
Explain RAG architecture
What is vector similarity?
How does Redis caching work?

🔥 Future Improvements

  • PDF ingestion
  • Chunking pipeline
  • Persistent vector database
  • Streaming responses
  • Chat memory
  • Hybrid retrieval
  • Metadata filtering
  • Reranking
  • Multi-document support
  • Agent workflows

🧠 Learning Goals

This project is designed to teach:

  • AI systems engineering
  • Retrieval pipelines
  • Semantic memory architecture
  • Local inference systems
  • Vector retrieval
  • Modern RAG architecture

📜 License

MIT License

About

local semantic retrieval + RAG system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages