Skip to content

miosipof/fluens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fluens

Fluens is an open-source framework for low-latency, real-time speech analysis with an emphasis on on-device / edge-friendly deployment.

Contents

  • Backend core for speech practice or communication coaching apps

  • Real-time speech analytics for accessibility, captioning, and note-taking tools

  • Research and prototyping framework for speech ML systems (streaming inference, post-processing, evaluation)

  • Real-time ASR using a NeMo Conformer-based architecture (streaming, partial hypotheses, commit policy)

  • Support for phonemic ASR variants using a NeMo Conformer-based architecture

  • Real-time fluency event classification (frame-wise logits via an ONNX model interface)

  • Optional text continuation / phrase-starter suggestions using GPT-like models (pluggable)

  • Support for specialized ASR variants (e.g., robust fine-tuned ASR models)

  • A C++ streaming core intended to integrate with multiple targets (e.g., iOS, macOS, Android) via connectors/wrappers. Status: the only fully implemented comm interface at the moment is macOS

Model signatures and some pre-trained weights are described in MODELS.md.

  • We reference a NeMo Conformer-Large fine-tuned on the TORGO dataset of english dysarthric speech
  • We reference a NeMo Conformer-Large fine-tuned on TEDLIUM dataset of English speech for Phonemic Speech recognition
  • We do not provide weights or references for dysfluency detection and phrase completion.

Disclaimer

This repository is provided for research and developer use. It is not intended to be used as a medical device or for diagnosis, treatment, monitoring, or clinical decision-making. Anyone integrating Fluens into a product is solely responsible for validation, regulatory compliance, safety, and appropriate use.

This quickstart shows how to build and run the current macOS demo for:

  • Streaming ASR
  • ASR + fluency/event logits (optional)
  • ASR + fluency/event logits + optional phrase-starter generation (optional)

Status: the only fully implemented comm interface at the moment is macOS (asr/comm/macos).


Prerequisites

  • macOS with a C++ toolchain (Xcode Command Line Tools)
  • CMake (>= 3.20 recommended)
  • ONNX Runtime dependencies as required by the project
    • ASR module specifications: asr/core/README.md
    • Fluency module specifications: flu/FLU.md
    • GPT phrase completion specifications: flu/GPT.md

Model locations (current defaults)

  • ASR model package:

    • asr/packages/en_conformer_small
    • place your detector .onnx export results in this folder
  • Optional LLM (phrase-starters) ONNX model directory:

    • ml/LLM/distilgpt2_onnx

Note: Some features require additional model weights that are not distributed with this repo. See MODELS.md / ASR_Contracts.md for expected ONNX signatures and export guidance.


Build (macOS)

cd asr/comm/macos
mkdir -p build
cd build
cmake ..
cmake --build . -j

The resulting demo binary is built as:

  • asr/comm/macos/build/minimal_sasr_core

Run demos

1) Test ASR only (default config)

./asr/comm/macos/build/minimal_sasr_core --asr asr/config/asr.global.json asr/packages/en_conformer_small

2) Test ASR with a different configuration (e.g., low-latency / fluency-friendly)

./asr/comm/macos/build/minimal_sasr_core --asr asr/config/asr.flu.json asr/packages/en_conformer_small

3) Test ASR + fluency/event logits (requires a fluency ONNX model)

./asr/comm/macos/build/minimal_sasr_core --asr --flu flu/config/flu.global.json asr/config/asr.flu.json asr/packages/en_conformer_small

4) Test ASR + fluency/event logits + phrase-starter suggestions (requires fluency model + LLM ONNX)

./asr/comm/macos/build/minimal_sasr_core --flu flu/config/flu.global.json asr/config/asr.flu.json asr/packages/en_conformer_small ml/LLM/distilgpt2_onnx

Troubleshooting

  • If you see missing model / signature errors, confirm:

    • the model path exists,

    • the ONNX model matches the expected input/output contract described in ASR_Contracts.md / MODELS.md,

    • your config JSON points to the correct model names and sample rate settings.

  • If the binary can’t access the microphone, check macOS privacy permissions for Terminal / your IDE.

Next steps

  • See CORE.md for core architecture notes.

  • See ASR_Contracts.md and MODELS.md for model contracts and export guidance.

ASR module

  • Streaming ASR built around a NeMo Conformer family model

  • Generates partial hypotheses with an explicit commit policy for stable outputs

  • Supports multiple configurations and model swapping (including fine-tuned variants)

Note: model weights are not bundled by default. See MODELS.md for ONNX signatures and export guidance.

Fluency module (event detection + optional suggestions)

The fluency pipeline is designed as a set of pluggable components:

  1. Streaming ASR context (can be a verbatim/low-latency ASR configuration)

  2. Streaming frame-wise event logits from a fluency/event classifier

  3. Optional suggestion module

    • When an event trigger fires, the current ASR context can be fed into a text-generation model to produce phrase-starters / continuation candidates.

    • The suggestion module is intentionally optional and can be disabled by default.

Important: suggestion quality depends heavily on the model and prompt constraints. Treat suggestions as optional UX hints, not as authoritative outputs.

Repo structure

.
├── ASR_Contracts.md
├── CORE.md
├── README.md
├── .gitignore
├── asr
│   ├── comm
│   │   ├── android
│   │   ├── ios
│   │   └── macos
│   │       ├── CMakeLists.txt
│   │       └── main.mm
│   ├── config
│   │   └── asr.global.json
│   ├── core
│   │   ├── include
│   │   │   ├── asr
│   │   │   │   ├── AsrConfig.hpp
│   │   │   │   ├── AsrEngine.hpp
│   │   │   │   ├── JsonConfig.hpp
│   │   │   │   └── ModelPackage.hpp
│   │   │   └── non-asr
│   │   └── src
│   │       ├── AsrEngine.cpp
│   │       ├── JsonConfig.cpp
│   │       └── ModelPackage.cpp
│   └── packages
│       └── en_conformer_small
│           └── package.json
└── ml

About

Low-latency edge speech framework for streaming ASR, fluency event detection, and assistive phrase completion

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors