Skip to content

confused-ai/supertonic-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Supertonic TTS API

FastAPI Python Docker

OpenAI-compatible Text-to-Speech API

Features β€’ Quick Start β€’ API Reference β€’ Deployment β€’ Configuration


✨ Features

  • πŸš€ OpenAI-Compatible API β€” Drop-in replacement for OpenAI's TTS API, no auth required
  • ⚑ High Performance β€” Dedicated thread pool, async synthesis, semaphore-based concurrency control
  • 🎡 Multiple Formats β€” MP3, WAV, FLAC, Opus, AAC, PCM via PyAV
  • πŸ—£οΈ Multiple Voices β€” OpenAI voice names + native Supertonic styles + custom/mixed voice upload
  • 🐳 Docker Ready β€” Production containerization with nginx load balancer and persistent model cache
  • πŸ“Š GPU Acceleration β€” CUDA, CoreML, and Metal backends via ONNX Runtime
  • πŸ”Š Smart Text Processing β€” Unicode normalization, emoji removal, auto-chunking, pause tags
  • 🌍 31 Languages β€” Full multilingual support via supertonic-3

πŸ“‹ Requirements

  • Python 3.10+
  • ONNX Runtime (CPU/CUDA/CoreML)
  • Supertonic TTS library

πŸš€ Quick Start

Using Docker (Recommended)

git clone https://github.com/confused-ai/supertonic-api.git
cd supertonic-api

# Start API + nginx load balancer
docker compose up -d

# API available at http://localhost:8800
# Model downloads once and is cached in a Docker volume

Manual Installation

git clone https://github.com/confused-ai/supertonic-api.git
cd supertonic-api

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8800

Quick Test

curl -X POST "http://localhost:8800/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Supertonic!", "voice": "alloy", "response_format": "mp3"}' \
  --output speech.mp3

πŸ“– API Reference

Generate Speech

POST /v1/audio/speech

curl -X POST "http://localhost:8800/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{"model": "tts-1", "input": "Your text here...", "voice": "alloy", "response_format": "mp3", "speed": 1.0}' \
  --output output.mp3

Parameters

Parameter Type Default Description
model string tts-1 Accepted for OpenAI compatibility; model is supertonic-3
input string β€” Text to synthesize (1–4096 chars)
voice string alloy Preset voice ID (see table below) or custom/mixed voice ID
response_format string mp3 mp3, opus, aac, flac, wav, pcm
speed float 1.0 Speed multiplier (0.5–2.0)
normalize boolean true Unicode normalization, emoji removal, punctuation fix
lang string en BCP-47 language code (31 languages supported + na)

List Models

GET /v1/models

curl "http://localhost:8800/v1/models"

Voices

10 native Supertonic styles exposed via 13 OpenAI-compatible voice IDs:

Voice ID Style Character
alloy F1 Calm, clear female
nova F2 Bright, professional female
shimmer F3 Soft, expressive female
ash F4 Energetic, versatile female
ballad F4 Melodic, smooth female (shares style with ash)
coral F5 Airy, warm female
marin F5 Gentle, natural female (shares style with coral)
echo M1 Lively, upbeat male
fable M2 Warm, narrative male
onyx M3 Deep, authoritative male
cedar M4 Measured, resonant male
sage M4 Calm, steady male (shares style with cedar)
verse M5 Dynamic, dramatic male

GET /v1/voices β€” full voice list with types (preset / custom / mixed)
GET /voices β€” legacy alias

curl "http://localhost:8800/v1/voices"

Upload Custom Voice

POST /v1/voices/upload

curl -X POST "http://localhost:8800/v1/voices/upload" \
  -F "file=@my_voice.json" \
  -F "name=my-voice"

Mix Two Voices

POST /v1/voices/mix

curl -X POST "http://localhost:8800/v1/voices/mix" \
  -H "Content-Type: application/json" \
  -d '{"voice_a": "alloy", "voice_b": "echo", "weight": 0.5, "name": "alloy-echo"}'

Delete Custom Voice

DELETE /v1/voices/{voice_id}

curl -X DELETE "http://localhost:8800/v1/voices/mix:alloy-echo"

Health Check

GET /health

curl "http://localhost:8800/health"

🎭 Available Voices

Voice Style Description
alloy F1 Sarah β€” calm female
echo M1 Alex β€” lively upbeat male
fable F2 Lily β€” bright cheerful female
onyx M2 James β€” deep robust male
nova F3 Jessica β€” professional announcer
shimmer M3 Robert β€” polished authoritative male

You can also use any native supertonic style name directly (e.g. F4, M5) or a custom/mixed voice ID.

βš™οΈ Configuration

Environment variables can be set in .env file:

# Server
HOST=0.0.0.0
PORT=8800
LOG_LEVEL=INFO

# Model Performance
MODEL_THREADS=12        # ONNX intra-op threads
MODEL_INTER_THREADS=4   # ONNX inter-op threads
MAX_WORKERS=8           # Concurrent synthesis workers + semaphore limit

# GPU Acceleration
FORCE_PROVIDERS=auto    # auto | cuda | coreml | metal | cpu

# Audio
SAMPLE_RATE=44100
MAX_CHUNK_LENGTH=300    # Max chars per synthesis chunk

# HuggingFace model cache (mounted as Docker volume)
HF_HOME=/root/.cache/huggingface

GPU Acceleration

Set FORCE_PROVIDERS based on your hardware:

Value Description
auto Auto-detect best available provider
cuda NVIDIA GPU acceleration
coreml Apple CoreML (M-series chips)
metal Apple Metal (maps to CoreML)
cpu CPU only

🐳 Deployment

Docker Compose (Production)

docker compose up -d --build

Services:

  • api β€” FastAPI + uvicorn on port 8801 (internal)
  • lb β€” nginx reverse proxy on port 8800 (public)
  • hf_cache β€” named Docker volume; model downloads once, reused on every restart

To scale API workers:

docker compose up -d --scale api=2

πŸ“Š Performance

  • Dedicated thread pool β€” synthesis runs in isolated ThreadPoolExecutor, never blocks the I/O loop
  • Thread-safe model init β€” double-checked locking; model loads once across all workers
  • Semaphore-bounded concurrency β€” MAX_WORKERS cap prevents memory exhaustion under load
  • PyAV streaming encoder β€” chunks encoded on-the-fly, no full audio buffering
  • Pre-compiled regex β€” text normalization patterns compiled at startup
  • Smart chunking β€” long text split at sentence/paragraph boundaries, preserves [pause:N] tags

πŸ”§ Development

pip install -r requirements.txt

# Dev server with auto-reload
uvicorn app.main:app --reload --port 8800

# Run all tests (unit + integration + eval)
python tests/run_all.py

# Unit tests only (no server needed)
python tests/run_all.py --unit-only

# With stress test
python tests/run_all.py --stress --concurrency 20 --requests 200

# Custom server
python tests/run_all.py --url http://localhost:8801

πŸ“ Project Structure

supertonic-api/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ routes/            # Endpoint modules (speech, voices, models)
β”‚   β”‚   └── schemas.py         # Pydantic I/O models
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py          # pydantic-settings (.env)
β”‚   β”‚   β”œβ”€β”€ constants.py       # Model name
β”‚   β”‚   └── voices.py          # OpenAI β†’ Supertonic voice map
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ tts.py             # Singleton TTS service + async generation
β”‚   β”‚   β”œβ”€β”€ audio.py           # AudioNormalizer, AudioService
β”‚   β”‚   └── audio_encoder.py   # PyAV streaming encoder (mp3/wav/flac/opus/aac/pcm)
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── text.py            # clean_text(), smart_split()
β”‚   β”œβ”€β”€ inference/
β”‚   β”‚   └── base.py            # AudioChunk dataclass
β”‚   └── main.py                # FastAPI app + lifespan
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ run_all.py             # Unified test runner
β”‚   └── output/                # Saved test audio files
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml         # api + nginx lb + hf_cache volume
nginx.conf
└── requirements.txt

🀝 Contributing

Contributions welcome. See CONTRIBUTING.md.

  1. Fork β†’ branch β†’ commit β†’ PR
  2. Run python tests/run_all.py --unit-only before submitting

πŸ™ Acknowledgments


⬆ Back to Top

Made with ❀️ by the community

About

Supertonic FastAPI - High Performance OpenAI-Compatible TTS API

Topics

Resources

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors