Skip to content
#

low-resource-language

Here are 11 public repositories matching this topic...

🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.

  • Updated May 7, 2026
  • Jupyter Notebook

Com la tokenització en subparaules fractura la morfologia catalana, i si una segmentació morfèmica universal en recupera la geometria. El català es fragmenta ~1,7× més que l'anglès (el punt volat fins a ~4×); auditoria d'11 tokenitzadors i geometria en 5 models petits.

  • Updated May 24, 2026
  • Python

Improve this page

Add a description, image, and links to the low-resource-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the low-resource-language topic, visit your repo's landing page and select "manage topics."

Learn more