Speaker

A voice assistant application with multi-container support, session persistence, and AI integration.

Features

Multi-container voice chat with separate AI contexts
Session persistence with auto-save
Task queue system with cancellation
Git worktree support for multi-branch development
Voice command parsing and execution
WebSocket-based real-time communication

Getting Started

Prerequisites

Node.js >= 16
Python 3.8+
Ollama (for local LLM)
Google Cloud account (for Gemini API)

Installation

Clone the repository

Install backend dependencies:

cd backend
pip install -r requirements.txt

Install frontend dependencies:
```
cd frontend
npm install
```

Running the Application

Start the backend:
```
./start.sh backend
```
Start the frontend:
```
./start.sh frontend
```

Voice Commands

See SPEAKER.md for the full list of supported voice commands and natural language patterns.

Project Structure

backend/
  main.py                    # FastAPI app, WebSocket /ws, REST endpoints
  services/
    whisper_service.py       # STT: faster-whisper large-v3
    tts_service.py           # TTS: Chatterbox with voice cloning
    claude_service.py        # Claude CLI wrapper (chat + planning modes)
    session_service.py       # Session persistence (JSON files)
    task_queue.py            # Per-container FIFO task queues
    git_service.py           # Git worktree management
    ai/
      base.py                # AIProvider abstract base
      gemini.py              # Google Gemini
      local.py               # Ollama + DuckDuckGo search
  data/
    sessions/                # Session JSON files (auto-created)

frontend/src/
  context/ContainerContext.tsx   # State: 5 containers (main, a-d), session integration
  hooks/
    useSharedVoice.ts            # Core: VAD, WebSocket, TTS, commands
    useSession.ts                # Session persistence management
    useTaskQueue.ts              # Task queue management
    useGitWorktree.ts            # Git worktree management
    useVoiceChat.ts              # Legacy hook
  utils/
    voiceCommands.ts             # Command parsing
    sessionStorage.ts            # localStorage wrapper
  components/
    ChatView.tsx                 # Messages with TTS playback
    ContainerTabs.tsx            # Tab navigation
    StatusIndicator.tsx          # Status display

Development

Adding New Voice Commands

To add a new voice command:

Add the command pattern to frontend/src/utils/voiceCommands.ts
Implement the command handler in frontend/src/hooks/useSharedVoice.ts
Update the README documentation

Adding New AI Providers

To add a new AI provider:

Create a new class that inherits from AIProvider in backend/services/ai/
Implement the required methods: get_response, reset_chat, and set_history
Register the provider in backend/main.py

Session Management

Sessions are automatically saved to backend/data/sessions/ with UUID tokens. The frontend uses localStorage for temporary session data.

Task Queue

Each container has its own task queue. Tasks are processed in FIFO order with support for cancellation.

Git Worktree Support

The application supports multi-branch development using git worktrees. Commands include:

GET /git/branches - List available branches
POST /git/worktree - Create a new worktree
DELETE /git/worktree/{branch} - Remove a worktree

API Endpoints

GET / - Main page
GET /ws - WebSocket connection for voice chat
GET /sessions - List all sessions
GET /sessions/{id} - Get a specific session
POST /sessions - Create a new session
PUT /sessions/{id} - Update a session
DELETE /sessions/{id} - Delete a session
GET /git/branches - List git branches
POST /git/worktree - Create git worktree
DELETE /git/worktree/{branch} - Delete git worktree

Environment Variables

Create a .env file in the root directory with the following variables:

GEMINI_API_KEY=your_gemini_api_key_here
LOCAL_LLM_MODEL=qwen3-coder-256k
CLAUDE_WORK_DIR=/path/to/claudeworkdir

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
backend		backend
frontend		frontend
logs		logs
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
SPEAKER.md		SPEAKER.md
start.sh		start.sh
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker

Features

Getting Started

Prerequisites

Installation

Running the Application

Voice Commands

Project Structure

Development

Adding New Voice Commands

Adding New AI Providers

Session Management

Task Queue

Git Worktree Support

API Endpoints

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speaker

Features

Getting Started

Prerequisites

Installation

Running the Application

Voice Commands

Project Structure

Development

Adding New Voice Commands

Adding New AI Providers

Session Management

Task Queue

Git Worktree Support

API Endpoints

Environment Variables

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages