Skip to content

Frostday/Zen-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zen-AI

YouTube video - https://www.youtube.com/watch?v=1IXw0J-AWng

Zen-AI is a RAG-enabled voice AI agent for calm, clarity, and everyday emotional guidance.

It runs on top of LiveKit for low-latency voice, uses Docling to build a vector database from psychology and philosophy PDFs, and exposes a React-based web client with real-time transcription. Zen-AI is designed as a philosophical and psychological companion (not a therapist): you talk to it about your day, your thoughts, or a specific situation, and it responds with validation, gentle cognitive checks, and small, concrete suggestions grounded in the books it has read.

Key features

  • 🎙️ Live voice conversations via LiveKit (speech → LLM → speech)
  • 📚 RAG over book PDFs using Docling + LanceDB
  • 🧠 Thought & ideology checks – helps you examine whether your interpretation of a situation is balanced and useful
  • 📝 Real-time transcript in a React frontend
  • 🛠️ Tool call for RAG – the LLM calls a rag_psych_advice tool to pull context from the vector DB when deeper guidance is needed and set_meditation_reminder tool to remind the user to meditate

Suggestions for books to add

  • "Meditations" by Marcus Aurelius
  • "The Art of War" by Sun Tzu
  • "Tao Te Ching" by Lao Tzu
  • "Crime and Punishment" by Fyodor Dostoevsky

⚠️ Zen-AI is a philosophical and psychological companion only. It does not provide medical, diagnostic, or crisis support and should not be used as a replacement for professional care.

Setup

Fill up the environment variables in frontend/.env and backend/.env as shown in .env.example

Backend Setup

  1. Setup conda environment
conda create -n zenai python=3.10 -y
conda activate zenai
conda install -c conda-forge numpy pandas pyarrow -y
pip install "livekit-agents[openai]~=1.2" python-dotenv fastapi "uvicorn[standard]" docling lancedb "lancedb[embeddings]" tiktoken python-dateutil openai
pip install livekit-plugins-deepgram livekit-plugins-silero
  1. Put books inside backend/data/pdfs/
  2. Create the RAG DB
cd backend/
python rag_index.py
  1. Run the token server
uvicorn token_server:app --reload --port 8000
  1. Run the agent in another terminal
conda activate zenai
cd backend/
python agent.py dev

Frontend Setup

  1. Setup the dependencies
cd frontend/
npm install
  1. Run application
npm run dev

Design Decisions

RAG Assumptions

Vector Database: LanceDB (local file-based storage)

  • Chosen for simplicity and local-first development
  • Stored at backend/data/lancedb/ as a directory of Lance files
  • Not cloud-hosted or distributed; suitable for single-instance deployments

Chunking Strategy: Docling's HybridChunker with merge_peers=True

  • Hybrid approach combines semantic and structural chunking
  • merge_peers=True merges adjacent chunks that are semantically related
  • Preserves document structure (headings, page numbers) in metadata

Embeddings: OpenAI text-embedding-3-large

  • 3072-dimensional vectors
  • Requires OPENAI_API_KEY for both indexing and querying
  • High-quality embeddings at the cost of API calls

Query Strategy:

  • Returns top k=4 chunks per query
  • Uses LanceDB's query_type="auto" for automatic query optimization
  • Chunks are combined with \n\n---\n\n separators before being passed to the LLM

Document Processing: Docling DocumentConverter

  • Handles PDF parsing and extraction
  • Preserves document structure and metadata (page numbers, headings)
  • Assumes PDFs are placed in backend/data/pdfs/ before indexing

LiveKit Agent Design

Voice Model: OpenAI RealtimeModel (gpt-4o-realtime-preview)

  • Handles speech-to-speech directly (no separate STT/TTS pipeline for agent responses)
  • Low-latency streaming responses
  • Voice: "alloy" (configurable)

Transcription: Deepgram STT (nova-2-conversationalai)

  • Separate from RealtimeModel; used only for displaying user transcriptions in the UI
  • Streaming interim results enabled for real-time display
  • Publishes transcriptions to LiveKit room for frontend consumption

Function Tools:

  • rag_psych_advice: On-demand RAG queries when deeper context is needed
  • set_meditation_reminder: In-call reminders (in-memory, session-scoped)

Architecture Patterns:

  • Lazy RAG loading to avoid multiprocessing import issues with LiveKit agents
  • Background reminder checker task runs every 5 seconds
  • Separate audio track processing for transcription display

Hosting Assumptions

Token Server:

  • FastAPI server on port 8000 (configurable)
  • CORS currently allows all origins (allow_origins=["*"]) — should be tightened in production
  • No authentication/authorization on token endpoint (assumes trusted frontend)
  • Requires LIVEKIT_API_KEY and LIVEKIT_API_SECRET environment variables

Agent Process:

  • Runs as a separate process from token server
  • Connects to LiveKit cloud service (requires LiveKit account)
  • Assumes stable network connection for real-time audio streaming

Data Storage:

  • LanceDB is local file-based (not cloud-hosted)
  • RAG index must be rebuilt if PDFs change
  • No database migrations or versioning for the vector store

Environment:

  • Python 3.10+ with conda environment
  • Node.js for frontend
  • Assumes .env files are properly configured in both frontend and backend

Trade-offs & Limitations

RAG trade-offs

  • Embedding costs: Using OpenAI text-embedding-3-large requires API calls for every chunk during indexing, which can be expensive for large document collections
  • Fixed retrieval size: Always returns top k=4 chunks, which may miss relevant context or include irrelevant chunks depending on query quality
  • Static index: Entire index must be rebuilt when PDFs change (no incremental updates or versioning)
  • Chunking limitations: HybridChunker with merge_peers=True may create variable-sized chunks that could split important concepts or merge unrelated content
  • Local-only storage: LanceDB is file-based and local, preventing shared index access across multiple agent instances
  • Limited metadata filtering: Cannot filter search results by source document, date, or other metadata attributes during query time
  • Basic context assembly: Simple concatenation of chunks with separators; no sophisticated context window management or overlap handling
  • Semantic-only search: Pure vector search without hybrid keyword/BM25 approach, which may miss exact term matches or technical terminology
  • Larger queries: Makes every query more expensive
  • Increased latency: Due to time taken for searching the vector database

Persistence:

  • Reminder store is in-memory only (REMINDER_STORE list) — lost on agent restart
  • No conversation history persistence
  • LanceDB index is static after creation (requires manual rebuild to update)

Scalability:

  • Single-instance deployment (local LanceDB not distributed)
  • Agent processes one conversation at a time per instance
  • No load balancing or horizontal scaling built-in

Language Support:

  • Hardcoded English-only responses (system prompt enforces this)
  • Deepgram STT configured for English only
  • No multi-language RAG support

Security:

  • Token server has no authentication at the moment (assumes trusted network/frontend)
  • CORS is wide open (needs restriction in production)
  • API keys stored in .env files (use secure secret management in production)

Cost Considerations:

  • OpenAI API calls for both embeddings and RealtimeModel (can be expensive at scale)
  • Deepgram STT API calls for transcription
  • LiveKit cloud service usage

Error Handling:

  • Basic error handling using structured logging
  • No retry logic for API failures
  • RAG failures return graceful fallback messages

Development vs Production:

  • Current setup optimized for development/local use
  • Production deployment would need: persistent storage, authentication, monitoring, logging, and proper secret management

AI Tool Usage

I used Cursor, GPT-5.1 and Claude-Sonnet-4.5 to make this.

Releases

No releases published

Packages

 
 
 

Contributors