Fluens is an open-source framework for low-latency, real-time speech analysis with an emphasis on on-device / edge-friendly deployment.
-
Backend core for speech practice or communication coaching apps
-
Real-time speech analytics for accessibility, captioning, and note-taking tools
-
Research and prototyping framework for speech ML systems (streaming inference, post-processing, evaluation)
-
Real-time ASR using a NeMo Conformer-based architecture (streaming, partial hypotheses, commit policy)
-
Support for phonemic ASR variants using a NeMo Conformer-based architecture
-
Real-time fluency event classification (frame-wise logits via an ONNX model interface)
-
Optional text continuation / phrase-starter suggestions using GPT-like models (pluggable)
-
Support for specialized ASR variants (e.g., robust fine-tuned ASR models)
-
A C++ streaming core intended to integrate with multiple targets (e.g., iOS, macOS, Android) via connectors/wrappers. Status: the only fully implemented comm interface at the moment is macOS
Model signatures and some pre-trained weights are described in MODELS.md.
- We reference a NeMo Conformer-Large fine-tuned on the TORGO dataset of english dysarthric speech
- We reference a NeMo Conformer-Large fine-tuned on TEDLIUM dataset of English speech for Phonemic Speech recognition
- We do not provide weights or references for dysfluency detection and phrase completion.
This repository is provided for research and developer use. It is not intended to be used as a medical device or for diagnosis, treatment, monitoring, or clinical decision-making. Anyone integrating Fluens into a product is solely responsible for validation, regulatory compliance, safety, and appropriate use.
This quickstart shows how to build and run the current macOS demo for:
- Streaming ASR
- ASR + fluency/event logits (optional)
- ASR + fluency/event logits + optional phrase-starter generation (optional)
Status: the only fully implemented comm interface at the moment is macOS (
asr/comm/macos).
- macOS with a C++ toolchain (Xcode Command Line Tools)
- CMake (>= 3.20 recommended)
- ONNX Runtime dependencies as required by the project
- ASR module specifications:
asr/core/README.md - Fluency module specifications:
flu/FLU.md - GPT phrase completion specifications:
flu/GPT.md
- ASR module specifications:
-
ASR model package:
asr/packages/en_conformer_small- place your detector .onnx export results in this folder
-
Optional LLM (phrase-starters) ONNX model directory:
ml/LLM/distilgpt2_onnx
Note: Some features require additional model weights that are not distributed with this repo. See
MODELS.md/ASR_Contracts.mdfor expected ONNX signatures and export guidance.
cd asr/comm/macos
mkdir -p build
cd build
cmake ..
cmake --build . -jThe resulting demo binary is built as:
- asr/comm/macos/build/minimal_sasr_core
./asr/comm/macos/build/minimal_sasr_core --asr asr/config/asr.global.json asr/packages/en_conformer_small./asr/comm/macos/build/minimal_sasr_core --asr asr/config/asr.flu.json asr/packages/en_conformer_small./asr/comm/macos/build/minimal_sasr_core --asr --flu flu/config/flu.global.json asr/config/asr.flu.json asr/packages/en_conformer_small./asr/comm/macos/build/minimal_sasr_core --flu flu/config/flu.global.json asr/config/asr.flu.json asr/packages/en_conformer_small ml/LLM/distilgpt2_onnx-
If you see missing model / signature errors, confirm:
-
the model path exists,
-
the ONNX model matches the expected input/output contract described in ASR_Contracts.md / MODELS.md,
-
your config JSON points to the correct model names and sample rate settings.
-
-
If the binary can’t access the microphone, check macOS privacy permissions for Terminal / your IDE.
-
See CORE.md for core architecture notes.
-
See ASR_Contracts.md and MODELS.md for model contracts and export guidance.
-
Streaming ASR built around a NeMo Conformer family model
-
Generates partial hypotheses with an explicit commit policy for stable outputs
-
Supports multiple configurations and model swapping (including fine-tuned variants)
Note: model weights are not bundled by default. See
MODELS.mdfor ONNX signatures and export guidance.
The fluency pipeline is designed as a set of pluggable components:
-
Streaming ASR context (can be a verbatim/low-latency ASR configuration)
-
Streaming frame-wise event logits from a fluency/event classifier
-
Optional suggestion module
-
When an event trigger fires, the current ASR context can be fed into a text-generation model to produce phrase-starters / continuation candidates.
-
The suggestion module is intentionally optional and can be disabled by default.
-
Important: suggestion quality depends heavily on the model and prompt constraints. Treat suggestions as optional UX hints, not as authoritative outputs.
.
├── ASR_Contracts.md
├── CORE.md
├── README.md
├── .gitignore
├── asr
│ ├── comm
│ │ ├── android
│ │ ├── ios
│ │ └── macos
│ │ ├── CMakeLists.txt
│ │ └── main.mm
│ ├── config
│ │ └── asr.global.json
│ ├── core
│ │ ├── include
│ │ │ ├── asr
│ │ │ │ ├── AsrConfig.hpp
│ │ │ │ ├── AsrEngine.hpp
│ │ │ │ ├── JsonConfig.hpp
│ │ │ │ └── ModelPackage.hpp
│ │ │ └── non-asr
│ │ └── src
│ │ ├── AsrEngine.cpp
│ │ ├── JsonConfig.cpp
│ │ └── ModelPackage.cpp
│ └── packages
│ └── en_conformer_small
│ └── package.json
└── ml