Text-to-speech with The Massively Multilingual Speech (MMS) project (facebookresearch)
This project is a TTS-MMS testing lab for study Facebook's TTSMMS project specific to Shan (lang_code: shn) language.
clone this project
git clone https://github.com/haohaaorg/ttsmms_lab.gitcd ttsmms_labpip install -r requirements.txtdownload tts model from facebookresearch and extract to model/
mkdir -p model/shn/ && wget -qO- https://dl.fbaipublicfiles.com/mms/tts/shn.tar.gz | tar -xz -C model/shn/ --strip-components 1use this shn_tts.py file or create a new one for using Shan's language model
from ttsmms import TTS
tts=TTS("./model/shn")
tts.synthesis("ၼုမ်ႇသိုၵ်းႁၢၼ် ႁဵတ်းၵၢၼ်ၵွၼ်းၶေႃၸိုင်ႈတႆး", wav_path="output/example_shn.wav")
# output: output/example_shn.wav fileuse this eng_tts.py file file or create a new one for using English's language model
from ttsmms import TTS
tts = TTS("./model/eng")
tts.synthesis("speech", wav_path="output/example_eng.wav")
# output: output/example_eng.wav fileor with combine 2 model, as this is a pre-train model which may not support all text or words the idea is to use both english and shan model for multilang in single line text
don't forget to download english model from
mkdir -p model/eng/ && wget -qO- https://dl.fbaipublicfiles.com/mms/tts/eng.tar.gz | tar -xz -C model/eng/ --strip-components 1python shn_tts.py
# python eng_tts.py
# python shn_tts_combined.pyRun with GPU (NVIDIA-CUDA) make a difference in term of precision and accuracy, for example
- Run with local CPU (intel core-i5) - example_CPU_local_run.wav
- Run with GPU (Google Colab with CUDA) - example_GPU_google_colab_run.wav
Google-Colab: fairseq_lab.ipynb - require python3.8
MIT