Skip to content

Audio-domain consumer feasibility question: 0.21.4 → 0.21.12 upgrade (context from DeepFilterNet revert) #2152

@czoli1976

Description

@czoli1976

First — thank you for tract. The fact that a full DeepFilterNet 3
speech-enhancement graph runs end-to-end in WebAssembly inside a
browser worker is a direct testament to tract's design: fast, small,
and actually practical for real-world deployment. The investment in
ONNX coverage + NNEF + the new facade layer is genuinely appreciated
from the consumer side.

Context

We're evaluating a browser/WASM noise-suppression library built on
DeepFilterNet 3 (libDF). libDF has been pinned to tract 0.21.4
since Rikorose/DeepFilterNet#558
(May 2024) and we're looking at whether to upgrade further within
the 0.21 line — specifically toward 0.21.12, which looks attractive
for WASM targets.

One data point we'd appreciate your perspective on: in June 2023,
DeepFilterNet reverted from tract 0.20.x back to 0.19.4 after
observing "audio artifacts" in enhanced output
(Rikorose/DeepFilterNet#405).
They later moved to 0.21 (Jan 2024) without incident and have been
stable on 0.21.4 for nearly two years.

We've also opened a question on the DeepFilterNet side about their
plans: Rikorose/DeepFilterNet#682.

Questions

  1. The 2023 revert: are you aware of the 0.19 → 0.20 audio-
    artifact issue, and if so, is there any context you can share
    about what changed in that transition that might have produced
    it? (No pressure if it's been too long — understandable.)

  2. The 0.21 line since 0.21.4: releases that look particularly
    relevant for a WASM speech-enhancement workload include:

    • 0.21.6 — "WASM f32 4x4 kernel" and multithreaded matmul
      runner
    • 0.21.8 — MMM kits + element-wise binary op optimizations
    • 0.21.10 — reduce optimizations impacting modern
      normalization layers
    • 0.21.12multithread-mm feature flag in tract-linalg

    Have these been exercised against audio / RNN-heavy ONNX graphs
    in production by other consumers you're aware of? Any known
    regressions or gotchas?

  3. Upgrade feasibility: from the tract side, are there specific
    changes within 0.21.4 → 0.21.12 you'd call out to a consumer
    doing audio speech-enhancement inference, either as concerns to
    validate against or as wins worth capturing?

The goal isn't to pressure anyone — we're collecting signal to
decide whether to drive the upgrade ourselves (with DNSMOS parity
validation on a clip corpus) or park the idea. Your perspective
would be really helpful regardless of which direction it points us.

Thank you again for what you've built.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions