SESAME

Speech Enhancement with Sparse Adaptive Mixture of Experts

SESAME is a TF-domain monaural speech enhancement model that simultaneously denoises magnitude and phase spectra. Built on the MP-SENet architecture, it replaces standard FFN layers with a Sparse Mixture-of-Experts (MoE) design for improved capacity without proportional compute increase.

Architecture

Audio -> STFT -> DenseEncoder -> TSTransformerBlocks (with MoE FFN) -> MaskDecoder + PhaseDecoder -> ISTFT

The generator (MPNet) processes noisy magnitude and phase in parallel:

DenseEncoder: dilated Conv2d blocks compress the TF representation
TSTransformerBlocks: alternating time and frequency self-attention with BiGRU-based FFN
MaskDecoder: predicts a multiplicative magnitude mask via learnable sigmoid
PhaseDecoder: directly estimates the clean phase via atan2

A MetricDiscriminator provides adversarial training signal by predicting a PESQ-proxy quality score.

MoE Design

The FFN in selected Transformer blocks is replaced by MoEFFN:

Token Choice Top-2: each token selects its best 2 out of N experts via softmax gating
Shared BiGRU backbone: all experts share a bidirectional GRU; only the projection heads are per-expert
Switch Transformer balance loss: gradient-based load balancing (f_i * P_i penalty)
DeepSeek-V3 adaptive bias: non-gradient bias term on routing logits, updated by load imbalance
Router z-loss: stabilizes gating logit magnitudes (ST-MoE)
Noise-conditioned routing: spectral magnitude is projected to a noise embedding that conditions the gate

Requirements

Python 3.13+
uv package manager
CUDA-capable GPU

Installation

git clone https://github.com/<your-org>/sesame.git
cd sesame
uv sync

Dataset

This project uses the VoiceBank+DEMAND dataset.

Download and extract the dataset
Resample all wav files to 16 kHz
Organize into data/clean/ and data/noisy/ directories
Create pipe-delimited file lists data/train.txt and data/test.txt (only the second field — filename — is used)

Update paths in config.yaml accordingly.

Training

Single GPU:

uv run python train.py --config config.yaml

Multi-GPU (DDP via torchrun):

uv run torchrun --nproc-per-node=gpu train.py --config config.yaml

Checkpoints and training logs are saved to the checkpoint_path directory (default: cp_model/). A copy of the config is saved alongside checkpoints.

Inference

uv run python inference.py --checkpoint_file cp_model/g_best

Options:

--input_noisy_wavs_dir: override noisy input directory
--input_clean_wavs_dir: provide clean references to compute metrics (PESQ, CSIG, CBAK, COVL, SSNR, STOI)
--output_dir: output directory for enhanced wav files (default: ../generated_files)

Configuration

Key sections in config.yaml:

Section	Key parameters
`model`	`dense_channel`, `num_tsblocks`, `n_heads`, `compress_factor`, `beta`
`model.moe`	`apply_to` (layer indices), `num_experts`, `top_k`, `expert_ffn_dim`, `noise_ctx_dim`
`training`	`learning_rate`, `batch_size`, `epochs`, `warmup_steps`, `loss_weights`
`data`	`sampling_rate`, `segment_size`, `n_fft`, `hop_size`, `win_size`
`paths`	`checkpoint_path`, `input_clean_wavs_dir`, `input_noisy_wavs_dir`

Set moe.apply_to: [] or remove the moe section entirely to train a baseline model without MoE.

Citation

@article{sesame2026,
  title={{SESAME}: Speech Enhancement with Sparse Adaptive Mixture of Experts},
  author={},
  year={2026}
}

@inproceedings{lu2023mp,
  title={{MP-SENet}: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra},
  author={Lu, Ye-Xin and Ai, Yang and Ling, Zhen-Hua},
  booktitle={Proc. Interspeech},
  pages={3834--3838},
  year={2023}
}

Acknowledgements

MP-SENet — base architecture
HiFi-GAN — training utilities
NSPP — phase estimation
CMGAN — composite metrics implementation
Switch Transformers — balance loss
DeepSeek-V3 — adaptive bias balancing

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
sesame		sesame
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
inference.py		inference.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SESAME

Architecture

MoE Design

Requirements

Installation

Dataset

Training

Inference

Configuration

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SESAME

Architecture

MoE Design

Requirements

Installation

Dataset

Training

Inference

Configuration

Citation

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages