VoiceCompanionAI is a production-architected conversational AI system deployed on Raspberry Pi hardware, designed to function as a long-term voice companion.
Inspired by Portal-style AI personalities, Companion is not a basic voice assistant. It is built to feel present, emotionally aware, memory-driven, and adaptive over time.
This project demonstrates full-stack AI systems engineering across edge devices, cloud orchestration, conversational memory, and personality modeling.
Build an AI companion that can,
- Capture voice conversations in real time
- Transcribe speech via cloud STT
- Generate contextual LLM responses
- Speak replies through TTS
- Maintain long-term memory
- Learn user preferences
- Detect emotional tone
- Adapt personality traits
- Initiate proactive interactions
Think, Smart toy + conversational agent + memory system.
Raspberry Pi Device Agent
โ
FastAPI Orchestrator API
โ
Background Worker
โ
PostgreSQL + pgvector Memory Layer
โ
OpenAI Services (STT, LLM, TTS, Embeddings)
- Microphone capture
- Push-to-talk or wake trigger
- Audio buffering
- Network retry handling
- TTS playback via speaker
- Device authentication
- Interaction ingestion
- Prompt assembly
- Memory retrieval
- Job queue insertion
- Response storage
- Audio serving endpoints
- Speech-to-text processing
- LLM completion
- Text-to-speech synthesis
- Memory extraction
- Embedding generation
- Profile summarization
- Emotion detection
- Observability logging
The system models long-term familiarity through,
- Conversational history tracking
- Preference learning
- Relationship memory
- Emotional context awareness
- Personality adaptation
- Mode-based behaviors
Companion personality is configurable and evolves over time.
{
"warmth": 0.9,
"humor": 0.7,
"curiosity": 0.8,
"energy": 0.6,
"verbosity": 0.4
}Voice commands can dynamically update traits,
- โBe funnierโ
- โTalk shorterโ
- โSwitch to bedtime modeโ
All changes persist in the bot_profiles table.
Behavior overlays include,
- Storytelling Mode
- Quiz Mode
- Bedtime Mode
- Encouragement Mode
- Passive Check-ins
Modes influence prompt tone, pacing, and response style.
The system analyzes emotional signals from interactions.
- Voice tone analysis (audio sentiment)
- Transcript sentiment fallback
- Emotion label
- Confidence score
- Emotional context memory
Emotion influences,
- Memory salience scoring
- Prompt tone injection
- Companion response style
Built on PostgreSQL + pgvector.
usersdevicesconversationsinteractionsmemoriesmemory_embeddingsuser_profilesbot_profilesjobsevents
- Vector embeddings
- Salience scoring
- Emotional tagging
- Retrieval ranking
Database-backed async worker pipeline.
SELECT โฆ FOR UPDATE SKIP LOCKEDjob claiming- Retry tracking
- Exponential backoff
- Status transitions
- Observability events
PROCESS_VOICE_INTERACTIONSUMMARIZE_PROFILEPROACTIVE_CHECKIN
Audio Capture โ Upload โ STT โ LLM โ TTS โ Playback
Latency and processing metrics are stored per interaction for observability.
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/voice-interactions |
Upload voice input |
| GET | /v1/interactions/latest |
Retrieve latest interaction |
| GET | /v1/audio/{id}.wav |
Fetch generated audio |
| GET | /health |
Service health check |
Optional extensions include bot profile retrieval and configuration endpoints.
Dockerized multi-service environment,
- FastAPI API container
- Background worker container
- PostgreSQL + pgvector
- Shared audio storage volume
Run locally,
docker-compose up --buildIncludes validation for,
- Prompt assembly
- Job locking behavior
- Memory extraction
- Emotion detection
- Personality parsing
- Python 3.11+
- FastAPI
- Async SQLAlchemy
- Alembic
- OpenAI Speech-to-Text
- OpenAI LLM
- OpenAI Text-to-Speech
- OpenAI Embeddings
- PostgreSQL
- pgvector
- Raspberry Pi
- PyAudio or sounddevice
- Docker Compose
This project demonstrates,
- Edge + cloud orchestration
- Conversational AI systems
- Voice pipelines
- Long-term memory modeling
- Personality systems
- Emotion-aware agents
- Async worker architecture
Planned roadmap items,
- Vision recognition (camera integration)
- Growth timeline summaries
- Parent or admin dashboard
- Offline inference fallback
- Multi-user household support
MIT, intended for educational, research, and portfolio demonstration use.
VoiceCompanionAI represents the convergence of conversational AI, edge computing, and emotionally intelligent agent systems deployed on consumer hardware.