Swift library for speaker embedding extraction using NVIDIA NeMo TitaNet-Small model converted to CoreML.
- Extract 192-dimensional speaker embeddings from audio
- Speaker verification (same/different speaker classification)
- Speaker profiles with embedding aggregation
- Optimized for iOS 16+ and macOS 13+
- Multiple model variants: FP32, FP16, Int8
- iOS 16.0+ / macOS 13.0+
- Swift 5.9+
dependencies: [
.package(url: "https://github.com/Otosaku/NeMoSpeaker-iOS.git", from: "1.1.0")
]The library does not include the model - you need to download it separately and add to your app bundle.
Download models: Google Drive
| Model | Size | Quality | Recommended for |
|---|---|---|---|
| TitaNetSmall.mlmodelc | ~27 MB | Best | Development, high accuracy |
| TitaNetSmall_fp16.mlmodelc | ~14 MB | Great | Production (recommended) |
| TitaNetSmall_int8.mlmodelc | ~7 MB | Good | Size-constrained apps |
- Download and unzip the archive
- Choose the model variant you need
- Rename to
TitaNetSmall.mlmodelcand add to your Xcode project - Ensure it's included in "Copy Bundle Resources" build phase
import NeMoSpeaker
// Get model URL from app bundle
guard let modelURL = Bundle.main.url(forResource: "TitaNetSmall", withExtension: "mlmodelc") else {
fatalError("Model not found in bundle")
}
// Initialize with model path
let speaker = try NeMoSpeaker(modelURL: modelURL)
// Extract embedding from audio samples (mono, 16kHz, Float32)
let embedding = try speaker.extractEmbedding(samples: audioSamples)
// Embedding is 192-dimensional, L2-normalized
print("Embedding dimension: \(embedding.vector.count)") // 192// Compare two audio samples
let result = try speaker.verify(
samples1: audioSamples1,
samples2: audioSamples2,
threshold: 0.5
)
print("Similarity: \(result.similarity)") // -1.0 to 1.0
print("Same speaker: \(result.isSameSpeaker)") // true/falselet embedding1 = try speaker.extractEmbedding(samples: samples1)
let embedding2 = try speaker.extractEmbedding(samples: samples2)
// Cosine similarity
let similarity = embedding1.cosineSimilarity(with: embedding2)
// Or use convenience method
let isSame = embedding1.isSameSpeaker(as: embedding2, threshold: 0.5)// Create a speaker profile
var profile = SpeakerProfile(id: "user_1", embedding: embedding1)
// Add more samples to improve accuracy
profile.addEmbedding(embedding2)
profile.addEmbedding(embedding3)
print("Profile sample count: \(profile.sampleCount)")
// Verify against profile
let result = profile.verify(unknownEmbedding, threshold: 0.5)// Use specific input duration for better control
let embedding = try speaker.extractEmbedding(
samples: audioSamples,
duration: .threeSeconds // 1s, 3s, 5s, or 10s
)| Duration | Audio Samples | Mel Frames |
|---|---|---|
| 1 sec | 16,000 | 112 |
| 3 sec | 48,000 | 304 |
| 5 sec | 80,000 | 512 |
| 10 sec | 160,000 | 1,008 |
- Sample rate: 16,000 Hz
- Channels: Mono
- Format: Float32
- Model: TitaNet-Small (NVIDIA NeMo)
- Embedding dimension: 192
- Variants: FP32 (~27 MB), FP16 (~14 MB), Int8 (~7 MB)
- Original source: NVIDIA NeMo
| Threshold | Use Case |
|---|---|
| 0.4 | Lenient (fewer false rejections) |
| 0.5 | Balanced (default) |
| 0.6 | Strict (fewer false accepts) |
| 0.7+ | High security |
The SpeakerExample folder contains a demo iOS app with:
- Speaker enrollment
- Speaker verification
- Audio comparison
- Live diarization (real-time speaker detection)
To run the example:
- Open
SpeakerExample/SpeakerExample.xcodeprojin Xcode - Download model from Google Drive
- Rename to
TitaNetSmall.mlmodelcand drag into the Xcode project - Build and run on device or simulator
- NeMoFeatureExtractor-iOS (>= 1.0.5)
MIT License