YouTube video - https://www.youtube.com/watch?v=1IXw0J-AWng
Zen-AI is a RAG-enabled voice AI agent for calm, clarity, and everyday emotional guidance.
It runs on top of LiveKit for low-latency voice, uses Docling to build a vector database from psychology and philosophy PDFs, and exposes a React-based web client with real-time transcription. Zen-AI is designed as a philosophical and psychological companion (not a therapist): you talk to it about your day, your thoughts, or a specific situation, and it responds with validation, gentle cognitive checks, and small, concrete suggestions grounded in the books it has read.
Key features
- 🎙️ Live voice conversations via LiveKit (speech → LLM → speech)
- 📚 RAG over book PDFs using Docling + LanceDB
- 🧠 Thought & ideology checks – helps you examine whether your interpretation of a situation is balanced and useful
- 📝 Real-time transcript in a React frontend
- 🛠️ Tool call for RAG – the LLM calls a
rag_psych_advicetool to pull context from the vector DB when deeper guidance is needed andset_meditation_remindertool to remind the user to meditate
Suggestions for books to add
- "Meditations" by Marcus Aurelius
- "The Art of War" by Sun Tzu
- "Tao Te Ching" by Lao Tzu
- "Crime and Punishment" by Fyodor Dostoevsky
⚠️ Zen-AI is a philosophical and psychological companion only. It does not provide medical, diagnostic, or crisis support and should not be used as a replacement for professional care.
Fill up the environment variables in frontend/.env and backend/.env as shown in .env.example
- Setup conda environment
conda create -n zenai python=3.10 -y
conda activate zenai
conda install -c conda-forge numpy pandas pyarrow -y
pip install "livekit-agents[openai]~=1.2" python-dotenv fastapi "uvicorn[standard]" docling lancedb "lancedb[embeddings]" tiktoken python-dateutil openai
pip install livekit-plugins-deepgram livekit-plugins-silero
- Put books inside backend/data/pdfs/
- Create the RAG DB
cd backend/
python rag_index.py
- Run the token server
uvicorn token_server:app --reload --port 8000
- Run the agent in another terminal
conda activate zenai
cd backend/
python agent.py dev
- Setup the dependencies
cd frontend/
npm install
- Run application
npm run dev
Vector Database: LanceDB (local file-based storage)
- Chosen for simplicity and local-first development
- Stored at
backend/data/lancedb/as a directory of Lance files - Not cloud-hosted or distributed; suitable for single-instance deployments
Chunking Strategy: Docling's HybridChunker with merge_peers=True
- Hybrid approach combines semantic and structural chunking
merge_peers=Truemerges adjacent chunks that are semantically related- Preserves document structure (headings, page numbers) in metadata
Embeddings: OpenAI text-embedding-3-large
- 3072-dimensional vectors
- Requires
OPENAI_API_KEYfor both indexing and querying - High-quality embeddings at the cost of API calls
Query Strategy:
- Returns top
k=4chunks per query - Uses LanceDB's
query_type="auto"for automatic query optimization - Chunks are combined with
\n\n---\n\nseparators before being passed to the LLM
Document Processing: Docling DocumentConverter
- Handles PDF parsing and extraction
- Preserves document structure and metadata (page numbers, headings)
- Assumes PDFs are placed in
backend/data/pdfs/before indexing
Voice Model: OpenAI RealtimeModel (gpt-4o-realtime-preview)
- Handles speech-to-speech directly (no separate STT/TTS pipeline for agent responses)
- Low-latency streaming responses
- Voice: "alloy" (configurable)
Transcription: Deepgram STT (nova-2-conversationalai)
- Separate from RealtimeModel; used only for displaying user transcriptions in the UI
- Streaming interim results enabled for real-time display
- Publishes transcriptions to LiveKit room for frontend consumption
Function Tools:
rag_psych_advice: On-demand RAG queries when deeper context is neededset_meditation_reminder: In-call reminders (in-memory, session-scoped)
Architecture Patterns:
- Lazy RAG loading to avoid multiprocessing import issues with LiveKit agents
- Background reminder checker task runs every 5 seconds
- Separate audio track processing for transcription display
Token Server:
- FastAPI server on port 8000 (configurable)
- CORS currently allows all origins (
allow_origins=["*"]) — should be tightened in production - No authentication/authorization on token endpoint (assumes trusted frontend)
- Requires
LIVEKIT_API_KEYandLIVEKIT_API_SECRETenvironment variables
Agent Process:
- Runs as a separate process from token server
- Connects to LiveKit cloud service (requires LiveKit account)
- Assumes stable network connection for real-time audio streaming
Data Storage:
- LanceDB is local file-based (not cloud-hosted)
- RAG index must be rebuilt if PDFs change
- No database migrations or versioning for the vector store
Environment:
- Python 3.10+ with conda environment
- Node.js for frontend
- Assumes
.envfiles are properly configured in both frontend and backend
RAG trade-offs
- Embedding costs: Using OpenAI
text-embedding-3-largerequires API calls for every chunk during indexing, which can be expensive for large document collections - Fixed retrieval size: Always returns top
k=4chunks, which may miss relevant context or include irrelevant chunks depending on query quality - Static index: Entire index must be rebuilt when PDFs change (no incremental updates or versioning)
- Chunking limitations: HybridChunker with
merge_peers=Truemay create variable-sized chunks that could split important concepts or merge unrelated content - Local-only storage: LanceDB is file-based and local, preventing shared index access across multiple agent instances
- Limited metadata filtering: Cannot filter search results by source document, date, or other metadata attributes during query time
- Basic context assembly: Simple concatenation of chunks with separators; no sophisticated context window management or overlap handling
- Semantic-only search: Pure vector search without hybrid keyword/BM25 approach, which may miss exact term matches or technical terminology
- Larger queries: Makes every query more expensive
- Increased latency: Due to time taken for searching the vector database
Persistence:
- Reminder store is in-memory only (
REMINDER_STORElist) — lost on agent restart - No conversation history persistence
- LanceDB index is static after creation (requires manual rebuild to update)
Scalability:
- Single-instance deployment (local LanceDB not distributed)
- Agent processes one conversation at a time per instance
- No load balancing or horizontal scaling built-in
Language Support:
- Hardcoded English-only responses (system prompt enforces this)
- Deepgram STT configured for English only
- No multi-language RAG support
Security:
- Token server has no authentication at the moment (assumes trusted network/frontend)
- CORS is wide open (needs restriction in production)
- API keys stored in
.envfiles (use secure secret management in production)
Cost Considerations:
- OpenAI API calls for both embeddings and RealtimeModel (can be expensive at scale)
- Deepgram STT API calls for transcription
- LiveKit cloud service usage
Error Handling:
- Basic error handling using structured logging
- No retry logic for API failures
- RAG failures return graceful fallback messages
Development vs Production:
- Current setup optimized for development/local use
- Production deployment would need: persistent storage, authentication, monitoring, logging, and proper secret management
I used Cursor, GPT-5.1 and Claude-Sonnet-4.5 to make this.