Skip to content

mynlp/speechLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

163 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coarse-unit based Speech LM

This repository provides codes for reproducing experiments of the following paper:

Kando, S., Miyao, Y., Takamichi, S. (2025) Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models. Proc. Interspeech 2025, 5728-5732, doi: 10.21437/Interspeech.2025-310

Setup

Python environment

We use two environment management tools:

  • pyenv for managing python environment
  • poetry for managing python package

To reproduce the experimental environment, install Python 3.10.13 with pyenv and then run poetry install.

You may use other tools such as venv, conda or rye. There is no garantee that the implementation will work as expected.

Download dataset

Training dataset for K-means and SpeechLM

Download train and dev set of LibriSpeech from the website. Extracting the archives should produce the following directory structure.

LibriSpeech
├── BOOKS.TXT
├── CHAPTERS.TXT
├── LICENSE.TXT
├── README.TXT
├── SPEAKERS.TXT
├── dev
│   ├── dev-clean
│   └── dev-other
└── train
    ├── train-clean-100
    ├── train-clean-360
    └── train-other-500

Benchmarks for SpeechLM

Download the following benchmarks:

Experimental Steps

We prepare for the example scripts under the example directory. Please refer to the comments in script files as well.

1. Train K-means model

Run example/kmeans_train.sh.

2. Infer with K-means model

Run example/kmeans_infer.sh.
Note that this should be done for all the dataset prepared, i.e. LibriSpeech (train and dev) and benchmarks (sBLIMP, sWUGGY, etc.).

Citation

@inproceedings{kando25_interspeech,
  title     = {{Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models}},
  author    = {Shunsuke Kando and Yusuke Miyao and Shinnosuke Takamichi},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {5728--5732},
  doi       = {10.21437/Interspeech.2025-310},
  issn      = {2958-1796},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors