First — thank you for tract. The fact that a full DeepFilterNet 3
speech-enhancement graph runs end-to-end in WebAssembly inside a
browser worker is a direct testament to tract's design: fast, small,
and actually practical for real-world deployment. The investment in
ONNX coverage + NNEF + the new facade layer is genuinely appreciated
from the consumer side.
Context
We're evaluating a browser/WASM noise-suppression library built on
DeepFilterNet 3 (libDF). libDF has been pinned to tract 0.21.4
since Rikorose/DeepFilterNet#558
(May 2024) and we're looking at whether to upgrade further within
the 0.21 line — specifically toward 0.21.12, which looks attractive
for WASM targets.
One data point we'd appreciate your perspective on: in June 2023,
DeepFilterNet reverted from tract 0.20.x back to 0.19.4 after
observing "audio artifacts" in enhanced output
(Rikorose/DeepFilterNet#405).
They later moved to 0.21 (Jan 2024) without incident and have been
stable on 0.21.4 for nearly two years.
We've also opened a question on the DeepFilterNet side about their
plans: Rikorose/DeepFilterNet#682.
Questions
-
The 2023 revert: are you aware of the 0.19 → 0.20 audio-
artifact issue, and if so, is there any context you can share
about what changed in that transition that might have produced
it? (No pressure if it's been too long — understandable.)
-
The 0.21 line since 0.21.4: releases that look particularly
relevant for a WASM speech-enhancement workload include:
- 0.21.6 — "WASM f32 4x4 kernel" and multithreaded matmul
runner
- 0.21.8 — MMM kits + element-wise binary op optimizations
- 0.21.10 — reduce optimizations impacting modern
normalization layers
- 0.21.12 —
multithread-mm feature flag in tract-linalg
Have these been exercised against audio / RNN-heavy ONNX graphs
in production by other consumers you're aware of? Any known
regressions or gotchas?
-
Upgrade feasibility: from the tract side, are there specific
changes within 0.21.4 → 0.21.12 you'd call out to a consumer
doing audio speech-enhancement inference, either as concerns to
validate against or as wins worth capturing?
The goal isn't to pressure anyone — we're collecting signal to
decide whether to drive the upgrade ourselves (with DNSMOS parity
validation on a clip corpus) or park the idea. Your perspective
would be really helpful regardless of which direction it points us.
Thank you again for what you've built.
First — thank you for tract. The fact that a full DeepFilterNet 3
speech-enhancement graph runs end-to-end in WebAssembly inside a
browser worker is a direct testament to tract's design: fast, small,
and actually practical for real-world deployment. The investment in
ONNX coverage + NNEF + the new facade layer is genuinely appreciated
from the consumer side.
Context
We're evaluating a browser/WASM noise-suppression library built on
DeepFilterNet 3 (libDF). libDF has been pinned to tract
0.21.4since Rikorose/DeepFilterNet#558
(May 2024) and we're looking at whether to upgrade further within
the 0.21 line — specifically toward
0.21.12, which looks attractivefor WASM targets.
One data point we'd appreciate your perspective on: in June 2023,
DeepFilterNet reverted from tract
0.20.xback to0.19.4afterobserving "audio artifacts" in enhanced output
(Rikorose/DeepFilterNet#405).
They later moved to
0.21(Jan 2024) without incident and have beenstable on
0.21.4for nearly two years.We've also opened a question on the DeepFilterNet side about their
plans: Rikorose/DeepFilterNet#682.
Questions
The 2023 revert: are you aware of the
0.19 → 0.20audio-artifact issue, and if so, is there any context you can share
about what changed in that transition that might have produced
it? (No pressure if it's been too long — understandable.)
The 0.21 line since 0.21.4: releases that look particularly
relevant for a WASM speech-enhancement workload include:
runner
normalization layers
multithread-mmfeature flag in tract-linalgHave these been exercised against audio / RNN-heavy ONNX graphs
in production by other consumers you're aware of? Any known
regressions or gotchas?
Upgrade feasibility: from the tract side, are there specific
changes within
0.21.4 → 0.21.12you'd call out to a consumerdoing audio speech-enhancement inference, either as concerns to
validate against or as wins worth capturing?
The goal isn't to pressure anyone — we're collecting signal to
decide whether to drive the upgrade ourselves (with DNSMOS parity
validation on a clip corpus) or park the idea. Your perspective
would be really helpful regardless of which direction it points us.
Thank you again for what you've built.