🧠 Local RAG System with ChromaDB + Ollama

A fully local Retrieval-Augmented Generation (RAG) system built using:

Ollama
Phi3
Nomic Embeddings
ChromaDB
Streamlit

This project demonstrates the core architecture behind modern AI systems:

Local LLM inference
Embedding generation
Semantic search
Vector databases
Retrieval pipelines
External memory systems
Context-augmented generation

🚀 Features

Fully local AI stack
Semantic document retrieval
Vector similarity search
Local embedding generation
Retrieval-Augmented Generation (RAG)
Streamlit-based UI
ChromaDB vector storage
Ollama local inference runtime

🧠 Architecture

Documents
↓
Embedding Model (nomic-embed-text)
↓
Vector Embeddings
↓
ChromaDB Vector Store
↓
User Query
↓
Query Embedding
↓
Semantic Similarity Search
↓
Retrieved Context
↓
Phi3 LLM
↓
Generated Response

📦 Tech Stack

Component	Technology
LLM Runtime	Ollama
Generation Model	phi3
Embedding Model	nomic-embed-text
Vector Database	ChromaDB
Frontend	Streamlit
Language	Python

📁 Project Structure

local-rag-system/
│
├── app.py
├── knowledge.txt
├── requirements.txt
├── README.md
└── .gitignore

⚙️ Installation

1. Clone Repository

git clone https://github.com/IMRANDIL/local-rag-system-phi
cd local-rag-system

2. Create Virtual Environment

Windows

python -m venv venv
venv\Scripts\activate

Git Bash

source venv/Scripts/activate

3. Install Dependencies

pip install -r requirements.txt

🤖 Install Ollama Models

Pull Phi3

ollama pull phi3

Pull Embedding Model

ollama pull nomic-embed-text

▶️ Run Application

streamlit run app.py

Application runs on:

http://localhost:8501

🧩 How It Works

Step 1 — Document Ingestion

Documents from knowledge.txt are loaded and converted into embeddings using:

nomic-embed-text

Step 2 — Vector Storage

Embeddings are stored in ChromaDB for semantic retrieval.

Step 3 — Query Embedding

User question is converted into embedding vector.

Step 4 — Semantic Retrieval

ChromaDB performs vector similarity search to retrieve the most relevant context.

Step 5 — Context Injection

Retrieved documents are injected into the final prompt.

Step 6 — Local LLM Generation

Phi3 generates the final response using retrieved context.

🧠 Core AI Concepts Demonstrated

This project demonstrates:

Transformer inference
Embeddings
Semantic search
Vector similarity
Retrieval-Augmented Generation
External memory systems
Local AI infrastructure
Vector databases
Context injection

📌 Example Questions

What is semantic search?

Explain RAG architecture

What is vector similarity?

How does Redis caching work?

🔥 Future Improvements

PDF ingestion
Chunking pipeline
Persistent vector database
Streaming responses
Chat memory
Hybrid retrieval
Metadata filtering
Reranking
Multi-document support
Agent workflows

🧠 Learning Goals

This project is designed to teach:

AI systems engineering
Retrieval pipelines
Semantic memory architecture
Local inference systems
Vector retrieval
Modern RAG architecture

📜 License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Local RAG System with ChromaDB + Ollama

🚀 Features

🧠 Architecture

📦 Tech Stack

📁 Project Structure

⚙️ Installation

1. Clone Repository

2. Create Virtual Environment

Windows

Git Bash

3. Install Dependencies

🤖 Install Ollama Models

Pull Phi3

Pull Embedding Model

▶️ Run Application

🧩 How It Works

Step 1 — Document Ingestion

Step 2 — Vector Storage

Step 3 — Query Embedding

Step 4 — Semantic Retrieval

Step 5 — Context Injection

Step 6 — Local LLM Generation

🧠 Core AI Concepts Demonstrated

📌 Example Questions

🔥 Future Improvements

🧠 Learning Goals

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
knowledge.txt		knowledge.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Local RAG System with ChromaDB + Ollama

🚀 Features

🧠 Architecture

📦 Tech Stack

📁 Project Structure

⚙️ Installation

1. Clone Repository

2. Create Virtual Environment

Windows

Git Bash

3. Install Dependencies

🤖 Install Ollama Models

Pull Phi3

Pull Embedding Model

▶️ Run Application

🧩 How It Works

Step 1 — Document Ingestion

Step 2 — Vector Storage

Step 3 — Query Embedding

Step 4 — Semantic Retrieval

Step 5 — Context Injection

Step 6 — Local LLM Generation

🧠 Core AI Concepts Demonstrated

📌 Example Questions

🔥 Future Improvements

🧠 Learning Goals

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages