Coarse-unit based Speech LM

This repository provides codes for reproducing experiments of the following paper:

Kando, S., Miyao, Y., Takamichi, S. (2025) Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models. Proc. Interspeech 2025, 5728-5732, doi: 10.21437/Interspeech.2025-310

Setup

Python environment

We use two environment management tools:

pyenv for managing python environment
poetry for managing python package

To reproduce the experimental environment, install Python 3.10.13 with pyenv and then run poetry install.

You may use other tools such as venv, conda or rye. There is no garantee that the implementation will work as expected.

Download dataset

Training dataset for K-means and SpeechLM

Download train and dev set of LibriSpeech from the website. Extracting the archives should produce the following directory structure.

LibriSpeech
├── BOOKS.TXT
├── CHAPTERS.TXT
├── LICENSE.TXT
├── README.TXT
├── SPEAKERS.TXT
├── dev
│   ├── dev-clean
│   └── dev-other
└── train
    ├── train-clean-100
    ├── train-clean-360
    └── train-other-500

Benchmarks for SpeechLM

Download the following benchmarks:

sBLIMP and sWUGGY: included in sLM21-dataset
prosaudit: included in prosaudit-dataset
tSC (Topic StoryCloze)

Experimental Steps

We prepare for the example scripts under the example directory. Please refer to the comments in script files as well.

1. Train K-means model

Run example/kmeans_train.sh.

2. Infer with K-means model

Run example/kmeans_infer.sh.
Note that this should be done for all the dataset prepared, i.e. LibriSpeech (train and dev) and benchmarks (sBLIMP, sWUGGY, etc.).

Citation

@inproceedings{kando25_interspeech,
  title     = {{Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models}},
  author    = {Shunsuke Kando and Yusuke Miyao and Shinnosuke Takamichi},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {5728--5732},
  doi       = {10.21437/Interspeech.2025-310},
  issn      = {2958-1796},
}

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
example		example
speechlm		speechlm
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coarse-unit based Speech LM

Setup

Python environment

Download dataset

Training dataset for K-means and SpeechLM

Benchmarks for SpeechLM

Experimental Steps

1. Train K-means model

2. Infer with K-means model

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Coarse-unit based Speech LM

Setup

Python environment

Download dataset

Training dataset for K-means and SpeechLM

Benchmarks for SpeechLM

Experimental Steps

1. Train K-means model

2. Infer with K-means model

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages