Balaka

Local-only TTS application with a FastAPI backend and a separate static frontend.

Balaka runs the speech model on your machine. There is no remote inference fallback, no Hugging Face Space dependency at request time, and no hidden service layer behind the API. The current default model is k2-fsa/OmniVoice.

What you get

Local FastAPI API for speech synthesis
Separate build-free frontend in frontend/
English-first UI with optional Ukrainian interface localization
Two working modes:
- TTS with voice design
- Voice cloning from a reference recording
Runtime metadata endpoint for languages and voice attributes
Single-venv setup based on Python 3.13
Tests for the public API surface

Stack

Python 3.13
FastAPI
PyTorch 2.8
omnivoice==0.1.2
Plain HTML, CSS, and JavaScript

Requirements

macOS, Linux, or Windows with Python 3.13
One project virtualenv named .venv
Enough RAM or VRAM for the selected TTS model

torch 2.8 is not available for Python 3.14, so this project intentionally targets Python 3.13.

Quick start

Create a clean environment:

rm -rf .venv
python3.13 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

Install runtime dependencies.

Apple Silicon:

pip install torch==2.8.0 torchaudio==2.8.0
pip install omnivoice==0.1.2
pip install -e ".[dev]"

NVIDIA CUDA example:

pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
pip install omnivoice==0.1.2
pip install -e ".[dev]"

Create local configuration:

cp .env.example .env

Run the app:

source .venv/bin/activate
balaka

Open http://127.0.0.1:8000/tts/.

Alternative:

python main.py

Convenience commands:

make run
make test
make smoke

Default user flow

Open /tts/
Leave advanced settings collapsed
Enter text
Click Generate speech
Listen in the built-in player or download the generated WAV

The interface defaults to English. Use the EN / UA toggle in the header to switch the frontend locale; the selection is stored in the browser.

The default frontend setup is tuned for a convenient out-of-the-box result:

Language: Ukrainian
Voice: female, young adult, moderate pitch
Steps: 32
Guidance scale: 2.0
Speed: 1.0
Denoise: enabled
Prompt preprocessing: enabled
Output postprocessing: enabled

Run from PyCharm

Set the project interpreter to .venv/bin/python
If .venv was previously created with Python 3.14, delete it and recreate it with Python 3.13
Run main.py or the balaka console command

Configuration

Recommended .env values:

BALAKA_DEBUG=false
BALAKA_TTS_MODEL=k2-fsa/OmniVoice
BALAKA_TTS_DEVICE=auto
BALAKA_TTS_DTYPE=auto
BALAKA_TTS_PRELOAD_RUNTIME=true
BALAKA_TTS_LOAD_ASR=false

What each variable does:

Variable	Default	Meaning
`BALAKA_DEBUG`	`false`	Runs the app without autoreload. This avoids reloading the large model on every source change.
`BALAKA_TTS_MODEL`	`k2-fsa/OmniVoice`	Hugging Face model ID or a local path to a compatible model directory.
`BALAKA_TTS_DEVICE`	`auto`	Picks `cuda`, then `mps`, then `cpu`, unless you force a specific device.
`BALAKA_TTS_DTYPE`	`auto`	Uses `float16` on GPU or MPS and `float32` on CPU.
`BALAKA_TTS_PRELOAD_RUNTIME`	`true`	Loads the model during startup so the first synthesis request is not delayed by model initialization.
`BALAKA_TTS_LOAD_ASR`	`false`	Keeps ASR disabled. Voice cloning in this project already requires explicit `reference_text`.

Optional:

HF_TOKEN is not required for normal local use after the model is cached
HF_TOKEN can still help on the first model download by raising Hugging Face rate limits

First startup behavior

The first startup downloads the configured model into the local Hugging Face cache if it is not already present.

After that:

repeated app starts reuse the local cache
the backend should not fetch model files again unless the cache is missing
model weights are still loaded into RAM or VRAM on every new process start, which is expected

API

Available endpoints:

GET /health
GET /api/v1/tts/meta
POST /api/v1/tts/design
POST /api/v1/tts/clone

Both synthesis endpoints return raw WAV bytes in the response body.

Example: voice design

curl -X POST http://127.0.0.1:8000/api/v1/tts/design \
  -F 'text=Привіт. Це локальна перевірка синтезу.' \
  -F 'language=Ukrainian' \
  -F 'gender=female' \
  -F 'age=young adult' \
  -F 'pitch=moderate pitch' \
  -F 'style=Auto' \
  -F 'accent=Auto' \
  -F 'dialect=Auto' \
  -F 'speed=1.0' \
  -F 'num_steps=32' \
  -F 'guidance_scale=2.0' \
  -F 'denoise=true' \
  -F 'preprocess_prompt=true' \
  -F 'postprocess_output=true' \
  --output design.wav

Example: voice clone

curl -X POST http://127.0.0.1:8000/api/v1/tts/clone \
  -F 'reference_audio=@voice.wav' \
  -F 'reference_text=Hello, this is the source voice.' \
  -F 'text=Привіт. Це клонований голос.' \
  -F 'language=Ukrainian' \
  -F 'speed=1.0' \
  -F 'num_steps=32' \
  -F 'guidance_scale=2.0' \
  -F 'denoise=true' \
  -F 'preprocess_prompt=true' \
  -F 'postprocess_output=true' \
  --output clone.wav

Project layout

.
├── frontend/              # Static UI mounted by FastAPI
├── src/balaka/api/        # HTTP routes
├── src/balaka/core/       # Settings and voice metadata helpers
├── src/balaka/schemas/    # Pydantic request and response models
├── src/balaka/services/   # Local TTS runtime integration
├── tests/                 # API tests
├── main.py                # Local entrypoint
└── pyproject.toml

Development

Run tests:

pytest

Or use:

make test
make smoke

The frontend is intentionally build-free. Static files are served directly by FastAPI from frontend/.

Troubleshooting

App fails on Python 3.14

This is expected. Recreate .venv with Python 3.13.

Startup mentions Hugging Face

This is normal on the first run when the model is not cached yet.

Startup is slow

This is expected when BALAKA_TTS_PRELOAD_RUNTIME=true, because the app loads the model before serving requests.

Voice cloning does not start

Make sure both of these are provided:

reference_audio
reference_text

Notes

This project is local-only by design
There is no remote Space fallback
The backend keeps model-specific logic inside the runtime service instead of leaking it into the API layer

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
docs/promotion		docs/promotion
frontend		frontend
src/balaka		src/balaka
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Balaka

What you get

Stack

Requirements

Quick start

Default user flow

Run from PyCharm

Configuration

First startup behavior

API

Example: voice design

Example: voice clone

Project layout

Development

Troubleshooting

App fails on Python 3.14

Startup mentions Hugging Face

Startup is slow

Voice cloning does not start

Notes

Project files

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Balaka

What you get

Stack

Requirements

Quick start

Default user flow

Run from PyCharm

Configuration

First startup behavior

API

Example: voice design

Example: voice clone

Project layout

Development

Troubleshooting

App fails on Python 3.14

Startup mentions Hugging Face

Startup is slow

Voice cloning does not start

Notes

Project files

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages