Zen-AI

YouTube video - https://www.youtube.com/watch?v=1IXw0J-AWng

Zen-AI is a RAG-enabled voice AI agent for calm, clarity, and everyday emotional guidance.

It runs on top of LiveKit for low-latency voice, uses Docling to build a vector database from psychology and philosophy PDFs, and exposes a React-based web client with real-time transcription. Zen-AI is designed as a philosophical and psychological companion (not a therapist): you talk to it about your day, your thoughts, or a specific situation, and it responds with validation, gentle cognitive checks, and small, concrete suggestions grounded in the books it has read.

Key features

🎙️ Live voice conversations via LiveKit (speech → LLM → speech)
📚 RAG over book PDFs using Docling + LanceDB
🧠 Thought & ideology checks – helps you examine whether your interpretation of a situation is balanced and useful
📝 Real-time transcript in a React frontend
🛠️ Tool call for RAG – the LLM calls a rag_psych_advice tool to pull context from the vector DB when deeper guidance is needed and set_meditation_reminder tool to remind the user to meditate

Suggestions for books to add

"Meditations" by Marcus Aurelius
"The Art of War" by Sun Tzu
"Tao Te Ching" by Lao Tzu
"Crime and Punishment" by Fyodor Dostoevsky

⚠️ Zen-AI is a philosophical and psychological companion only. It does not provide medical, diagnostic, or crisis support and should not be used as a replacement for professional care.

Setup

Fill up the environment variables in frontend/.env and backend/.env as shown in .env.example

Backend Setup

Setup conda environment

conda create -n zenai python=3.10 -y
conda activate zenai
conda install -c conda-forge numpy pandas pyarrow -y
pip install "livekit-agents[openai]~=1.2" python-dotenv fastapi "uvicorn[standard]" docling lancedb "lancedb[embeddings]" tiktoken python-dateutil openai
pip install livekit-plugins-deepgram livekit-plugins-silero

Put books inside backend/data/pdfs/
Create the RAG DB

cd backend/
python rag_index.py

Run the token server

uvicorn token_server:app --reload --port 8000

Run the agent in another terminal

conda activate zenai
cd backend/
python agent.py dev

Frontend Setup

Setup the dependencies

cd frontend/
npm install

Run application

npm run dev

Design Decisions

RAG Assumptions

Vector Database: LanceDB (local file-based storage)

Chosen for simplicity and local-first development
Stored at backend/data/lancedb/ as a directory of Lance files
Not cloud-hosted or distributed; suitable for single-instance deployments

Chunking Strategy: Docling's HybridChunker with merge_peers=True

Hybrid approach combines semantic and structural chunking
merge_peers=True merges adjacent chunks that are semantically related
Preserves document structure (headings, page numbers) in metadata

Embeddings: OpenAI text-embedding-3-large

3072-dimensional vectors
Requires OPENAI_API_KEY for both indexing and querying
High-quality embeddings at the cost of API calls

Query Strategy:

Returns top k=4 chunks per query
Uses LanceDB's query_type="auto" for automatic query optimization
Chunks are combined with \n\n---\n\n separators before being passed to the LLM

Document Processing: Docling DocumentConverter

Handles PDF parsing and extraction
Preserves document structure and metadata (page numbers, headings)
Assumes PDFs are placed in backend/data/pdfs/ before indexing

LiveKit Agent Design

Voice Model: OpenAI RealtimeModel (gpt-4o-realtime-preview)

Handles speech-to-speech directly (no separate STT/TTS pipeline for agent responses)
Low-latency streaming responses
Voice: "alloy" (configurable)

Transcription: Deepgram STT (nova-2-conversationalai)

Separate from RealtimeModel; used only for displaying user transcriptions in the UI
Streaming interim results enabled for real-time display
Publishes transcriptions to LiveKit room for frontend consumption

Function Tools:

rag_psych_advice: On-demand RAG queries when deeper context is needed
set_meditation_reminder: In-call reminders (in-memory, session-scoped)

Architecture Patterns:

Lazy RAG loading to avoid multiprocessing import issues with LiveKit agents
Background reminder checker task runs every 5 seconds
Separate audio track processing for transcription display

Hosting Assumptions

Token Server:

FastAPI server on port 8000 (configurable)
CORS currently allows all origins (allow_origins=["*"]) — should be tightened in production
No authentication/authorization on token endpoint (assumes trusted frontend)
Requires LIVEKIT_API_KEY and LIVEKIT_API_SECRET environment variables

Agent Process:

Runs as a separate process from token server
Connects to LiveKit cloud service (requires LiveKit account)
Assumes stable network connection for real-time audio streaming

Data Storage:

LanceDB is local file-based (not cloud-hosted)
RAG index must be rebuilt if PDFs change
No database migrations or versioning for the vector store

Environment:

Python 3.10+ with conda environment
Node.js for frontend
Assumes .env files are properly configured in both frontend and backend

Trade-offs & Limitations

RAG trade-offs

Embedding costs: Using OpenAI text-embedding-3-large requires API calls for every chunk during indexing, which can be expensive for large document collections
Fixed retrieval size: Always returns top k=4 chunks, which may miss relevant context or include irrelevant chunks depending on query quality
Static index: Entire index must be rebuilt when PDFs change (no incremental updates or versioning)
Chunking limitations: HybridChunker with merge_peers=True may create variable-sized chunks that could split important concepts or merge unrelated content
Local-only storage: LanceDB is file-based and local, preventing shared index access across multiple agent instances
Limited metadata filtering: Cannot filter search results by source document, date, or other metadata attributes during query time
Basic context assembly: Simple concatenation of chunks with separators; no sophisticated context window management or overlap handling
Semantic-only search: Pure vector search without hybrid keyword/BM25 approach, which may miss exact term matches or technical terminology
Larger queries: Makes every query more expensive
Increased latency: Due to time taken for searching the vector database

Persistence:

Reminder store is in-memory only (REMINDER_STORE list) — lost on agent restart
No conversation history persistence
LanceDB index is static after creation (requires manual rebuild to update)

Scalability:

Single-instance deployment (local LanceDB not distributed)
Agent processes one conversation at a time per instance
No load balancing or horizontal scaling built-in

Language Support:

Hardcoded English-only responses (system prompt enforces this)
Deepgram STT configured for English only
No multi-language RAG support

Security:

Token server has no authentication at the moment (assumes trusted network/frontend)
CORS is wide open (needs restriction in production)
API keys stored in .env files (use secure secret management in production)

Cost Considerations:

OpenAI API calls for both embeddings and RealtimeModel (can be expensive at scale)
Deepgram STT API calls for transcription
LiveKit cloud service usage

Error Handling:

Basic error handling using structured logging
No retry logic for API failures
RAG failures return graceful fallback messages

Development vs Production:

Current setup optimized for development/local use
Production deployment would need: persistent storage, authentication, monitoring, logging, and proper secret management

AI Tool Usage

I used Cursor, GPT-5.1 and Claude-Sonnet-4.5 to make this.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zen-AI

Setup

Backend Setup

Frontend Setup

Design Decisions

RAG Assumptions

LiveKit Agent Design

Hosting Assumptions

Trade-offs & Limitations

AI Tool Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zen-AI

Setup

Backend Setup

Frontend Setup

Design Decisions

RAG Assumptions

LiveKit Agent Design

Hosting Assumptions

Trade-offs & Limitations

AI Tool Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages