Audio-domain consumer feasibility question: 0.21.4 → 0.21.12 upgrade (context from DeepFilterNet revert)

First — thank you for tract. The fact that a full DeepFilterNet 3
speech-enhancement graph runs end-to-end in WebAssembly inside a
browser worker is a direct testament to tract's design: fast, small,
and actually practical for real-world deployment. The investment in
ONNX coverage + NNEF + the new facade layer is genuinely appreciated
from the consumer side.

### Context

We're evaluating a browser/WASM noise-suppression library built on
DeepFilterNet 3 (libDF). libDF has been pinned to tract `0.21.4`
since [Rikorose/DeepFilterNet#558](https://github.com/Rikorose/DeepFilterNet/pull/558)
(May 2024) and we're looking at whether to upgrade further within
the 0.21 line — specifically toward `0.21.12`, which looks attractive
for WASM targets.

One data point we'd appreciate your perspective on: in June 2023,
DeepFilterNet reverted from tract `0.20.x` back to `0.19.4` after
observing "audio artifacts" in enhanced output
([Rikorose/DeepFilterNet#405](https://github.com/Rikorose/DeepFilterNet/pull/405)).
They later moved to `0.21` (Jan 2024) without incident and have been
stable on `0.21.4` for nearly two years.

We've also opened a question on the DeepFilterNet side about their
plans: [Rikorose/DeepFilterNet#682](https://github.com/Rikorose/DeepFilterNet/issues/682).

### Questions

1. **The 2023 revert**: are you aware of the `0.19 → 0.20` audio-
   artifact issue, and if so, is there any context you can share
   about what changed in that transition that might have produced
   it? (No pressure if it's been too long — understandable.)

2. **The 0.21 line since 0.21.4**: releases that look particularly
   relevant for a WASM speech-enhancement workload include:
   - **0.21.6** — "WASM f32 4x4 kernel" and multithreaded matmul
     runner
   - **0.21.8** — MMM kits + element-wise binary op optimizations
   - **0.21.10** — reduce optimizations impacting modern
     normalization layers
   - **0.21.12** — `multithread-mm` feature flag in tract-linalg

   Have these been exercised against audio / RNN-heavy ONNX graphs
   in production by other consumers you're aware of? Any known
   regressions or gotchas?

3. **Upgrade feasibility**: from the tract side, are there specific
   changes within `0.21.4 → 0.21.12` you'd call out to a consumer
   doing audio speech-enhancement inference, either as concerns to
   validate against or as wins worth capturing?

The goal isn't to pressure anyone — we're collecting signal to
decide whether to drive the upgrade ourselves (with DNSMOS parity
validation on a clip corpus) or park the idea. Your perspective
would be really helpful regardless of which direction it points us.

Thank you again for what you've built.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio-domain consumer feasibility question: 0.21.4 → 0.21.12 upgrade (context from DeepFilterNet revert) #2152

Context

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Audio-domain consumer feasibility question: 0.21.4 → 0.21.12 upgrade (context from DeepFilterNet revert) #2152

Description

Context

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions