Audio Clone Studio is a local-first voice cloning and speech production workspace built on top of the F5-TTS stack. It combines fast reference-based synthesis, project-aware asset management, trained voice checkpoints, reusable voice profiles, batch rendering, aligned speech editing, diagnostics, and a polished browser studio in one repository.
This repository is designed for practical production work:
- clone and render from curated reference clips
- manage reusable voice profiles and style prompts
- switch between shipped base models and local trained checkpoints
- queue preview and final renders safely on shared hardware
- perform transcript-aligned speech edits instead of destructive full rerenders
- review takes with diagnostics, comparison views, and export bundles
- serve the studio locally or mount it behind a small FastAPI service
- Voice Studio: project-based browser workspace for references, styles, profiles, takes, and exports
- Trained Voice Workflow: route renders through a local finetuned checkpoint with profile-based reference selection
- Voice Profiles: group multiple clean references so the app can choose the best clip per render
- Editing Tools: alignment-first replace, insert, and delete actions with localized rerendering
- Batch and Queueing: render long scripts one job at a time on constrained machines
- Diagnostics: transcript drift checks, silence checks, reference scoring, and take review helpers
- Apple Silicon Support: Apple-friendly transcription extras and an optional MLX backend for shipped base-model inference
- API Mode: optional FastAPI server with mounted studio UI and local REST endpoints
-
f5-tts_voice-studioThe main browser studio for cloning, rendering, editing, and export. -
f5-tts_infer-gradioThe broader inference demo with multi-style and multi-speaker utilities. -
f5-tts_studio-serverFastAPI server that exposes the studio UI under/appand API routes under/api/v1.
- short-form and long-form voice cloning
- trained checkpoint auditioning
- project-scoped pronunciation dictionaries
- style prompt reuse
- batch render pipelines
- take comparison and export packaging
- transcript-aligned speech editing
python -m venv .venv
source .venv/bin/activateIf you prefer Conda:
conda create -n audio-clone python=3.11
conda activate audio-cloneFFmpeg is required for reference preparation, audio decoding, and several export paths.
macOS:
brew install ffmpegUbuntu / Debian:
sudo apt-get update
sudo apt-get install ffmpegApple Silicon:
pip install torch torchaudioNVIDIA:
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128Studio install:
pip install -e ".[studio]"Apple-Silicon studio install:
pip install -e ".[studio,apple_audio]"Training extras:
pip install -e ".[train]"Development extras:
pip install -e ".[dev]"f5-tts_voice-studiof5-tts_studio-serverf5-tts_infer-gradiof5-tts_infer-cli \
--model F5TTS_v1_Base \
--ref_audio "ref_audio.wav" \
--ref_text "Reference transcript." \
--gen_text "Text to render."Start with a project in the studio and keep related references, styles, renders, and exports together.
Add short, clean clips with confirmed transcripts. The studio can analyze and score them for later reuse.
Group several strong references from the same speaker. The studio can then pick the most suitable clip automatically for each render.
Save alternate delivery examples for pacing, tone, or mood so they can be reused across scripts.
Use:
- Quick Preview for faster iteration
- Final Render for higher-quality delivery
- Voices for direct text-to-speech with a saved profile or single reference
- Trained Voice for checkpoint-based rendering
Compare takes, run diagnostics, and export WAV, MP3, transcripts, or packaged bundles.
On Apple Silicon, the studio supports:
mlx-whisperfor local transcription- optional
Apple MLX F5-TTSbase-model inference from the Runtime panel - automatic fallback to PyTorch when a local finetuned checkpoint is selected
Recommended install:
pip install -e ".[studio,apple_audio]"The studio stores project data outside the repository.
macOS:
- support data:
~/Library/Application Support/F5-TTS-Studio - cache:
~/Library/Caches/F5-TTS-Studio
This keeps references, profiles, takes, exports, and the local library out of the working tree.
For temporary remote access, the repository includes helper scripts for local tunnel workflows:
./scripts/voice_studio_quick_tunnel.sh
./scripts/voice_studio_zrok.shThe studio can also enforce optional access controls through environment variables:
export F5_TTS_STUDIO_USERNAME=demo
export F5_TTS_STUDIO_PASSWORD=change-me
export F5_TTS_STUDIO_TOKEN=optional-api-token
export F5_TTS_MAX_UPLOAD_MB=64src/f5_tts/
api.py Public Python API
infer/ Inference entrypoints and demos
studio/ Voice Studio app, runtime, storage, API, editing
model/ Model and trainer code
train/ Finetuning entrypoints
runtime/ Deployment integrations
scripts/ Local launch and sharing helpers
testsuite/ Test coverage for studio and service flows
ckpts/ Optional local checkpoints
Syntax check:
python -m py_compile src/f5_tts/api.pyRun targeted tests:
python -m unittest -v testsuite.test_studio_serviceRun the full test suite:
python -m unittest discover -s testsuiteFinetuning entrypoints are available when training extras are installed:
f5-tts_finetune-cli
f5-tts_finetune-gradioFor a deeper walkthrough of inference usage, examples, and editing tools, see: