Small Swift package for text-to-speech with Hugging Face models running through mlx-audio-swift.
TTSMLX is intentionally TTS-only. Upstream mlx-audio also supports STT, STS, quantization utilities, and server paths, but those surfaces are not wrapped by this package yet.
- Simple actor-based API
- Streaming playback API for long-form speech
- Built-in catalog of supported MLX TTS models
- Hugging Face model search
- Lazy model download and cache management
- Voice and language selection when the model supports them
- Reference-audio voice cloning hooks
These models are included in the built-in TTSMLX.supportedModels catalog:
- Marvis-AI/marvis-tts-250m-v0.2-MLX-8bit - best default choice for a balanced quality/size tradeoff.
- mlx-community/pocket-tts - smallest and simplest option when startup speed and low memory matter most.
- mlx-community/Soprano-80M-bf16 - compact model that is still easy to run locally on Apple Silicon.
- mlx-community/VyvoTTS-EN-Beta-4bit - English Qwen3-based option when you want a smaller alternative to full Qwen3-TTS.
- mlx-community/orpheus-3b-0.1-ft-bf16 - larger multi-voice LlamaTTS model with expressive built-in speakers.
- mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit - strongest general-purpose option here when you want better quality, multilingual support, and voice-cloning style inputs.
For a fuller support matrix, including non-runnable tracked models such as MOSS-TTS-Nano, see Docs/ModelSupport.md or inspect TTSMLX.modelCatalog at runtime.
Quick picking guide:
-
Start with
Marvisif you want the safest default. -
Use
Pocket TTSfor fast, lightweight local generation. -
Use
Orpheuswhen you want more built-in English voice choices. -
Use
Qwen3-TTSwhen voice options, multilingual output, or cloning features matter more than model size. -
Try
VyvoTTSif you want a smaller English Qwen3-style model. Current upstream note: -
TTSMLXkeeps the local path dependency on../mlx-audio-swiftduring active fork development. This package does not pin a standalonemlx-audiobackend version by itself. -
The wrapper catalog only lists model families that the current local
mlx-audio-swiftruntime can synthesize with end to end. -
New upstream
mlx-audio v0.4.2TTS families such asIrodori-TTS,HumeAI TADA,KugelAudio TTS, andVoxtral-4B-TTS-2603may still appear in model search as discovery-only results, but they are intentionally marked unsupported until the local Swift backend gains loaders for them. -
Kitten TTSmay also appear in search as discovery-only for now. The local backend can parse its model assets, but audio generation and streaming are still not wired through yet, soTTSMLXdoes not advertise it as runnable. -
MOSS-TTS-Nanois tracked as planned support. As of April 13, 2026, it still requires the upstream OpenMOSS runtime rather than the localmlx-audio-swiftbackend used byTTSMLX. -
Upstream additions outside the TTS wrapper scope, such as
Cohere Transcribe ASR,Qwen2-Audio-7B-Instruct,Moshi STS, and Distil-Whisper documentation updates, are not exposed throughTTSMLXyet.
TTSMLX exposes a public support catalog so app code can distinguish between validated, implemented, and planned models:
let validated = TTSMLX.validatedModels
let planned = TTSMLX.plannedModels
for entry in TTSMLX.modelCatalog {
print(entry.displayName, entry.supportStage.rawValue, entry.modelURL?.absoluteString ?? "n/a")
}Nothing in this catalog is downloaded automatically.
Models are downloaded only when you explicitly call ensureDownloaded(...), or when you synthesize using a specific selected model.
import TTSMLX
let synthesizer = TTSSpeechSynthesizer()
let result = try await synthesizer.synthesize(
"Hello from TTSMLX",
using: TTSMLX.supportedModels[0],
options: .init(
language: .english,
outputURL: URL.documentsDirectory.appending(path: "hello.wav")
)
)
print(result.url)Add TTSMLX to your Swift package dependencies:
.package(url: "https://github.com/fil-technology/TTSMLX.git", from: "0.3.0")Then depend on the TTSMLX product in your target.
For longer passages, use synthesizeStream(...) to receive AVAudioPCMBuffer chunks as they are generated instead of waiting for a final file:
import AVFoundation
import TTSMLX
let synthesizer = TTSSpeechSynthesizer()
let stream = try await synthesizer.synthesizeStream(
"Read this out progressively.",
using: TTSMLX.supportedModels[0],
options: .init(streamingInterval: 1.0)
)
for try await chunk in stream {
print("chunk sample rate:", chunk.sampleRate)
print("frames:", chunk.buffer.frameLength)
}Use synthesize(...) when you want a finished WAV file.
Use synthesizeStream(...) when you want lower-latency playback and chunk-by-chunk delivery.
The current wrapper treats streaming as a buffer-delivery API: it does not surface a persisted file artifact for streamed runs, even if future upstream runtimes save one internally.
A small SwiftUI demo app is included at DemoApp.
Open it with:
cd DemoApp
./open-xcodelet store = TTSModelStore()
let models = try await store.searchModels(query: "mlx tts")
let installed = try await store.installedModels()
if let model = models.first {
_ = try await store.ensureDownloaded(model)
}
// Downloads happen one model at a time and only for the descriptor you pass in.You can ask the framework for model metadata sourced from Hugging Face tags and config.json:
let store = TTSModelStore()
let metadata = try await store.fetchMetadata(for: "mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit")
print(metadata.languageIdentifiers)
print(metadata.modelType ?? "unknown")
print(metadata.sampleRate ?? 0)
print(metadata.storageSizeBytes ?? 0)storageSizeBytes is the remote repository size reported by the Hugging Face model API.
let pocketTTS = TTSMLX.supportedModels.first { $0.id == "mlx-community/pocket-tts" }!
let audio = try await TTSSpeechSynthesizer().synthesize(
"A different voice",
using: pocketTTS,
options: .init(voice: .alba)
)TTSMLX uses Semantic Versioning.
- Use tagged releases like
v0.1.0,v0.2.0, andv1.0.0. - Consume stable versions from Swift Package Manager tags.
- Use GitHub Releases for release notes and downloadable source snapshots.
For this project, GitHub Releases are the right default distribution mechanism. GitHub Packages is not required for normal Swift package consumption, because SwiftPM already resolves packages directly from git tags.