TextGenerator downloads articles from Wikipedia and converts them to dialectological text format.
Linguists at NaRFU go on dialectological expeditions to different villages of Arkhangelsk region. The dialogs with locals are transcribed into notebooks and the examples of a dialect words and an example of its usage is written on cards. The dialectological text is a text that conveys linguistic features using special symbols like acutes, apostrophes etc.
Dialectological text: [ста̄ну́шку к‿руба́х'и пр'ишыва́л'и]
- Create an virtual environment (Optional)
python -m venv venv
# for windows:
./venv/Scripts/activate.ps1
# for linux:
source ./venv/bin/activate- Install package from GitHub
pip install git+https://github.com/DialecticalHTR/AnnotationExporter.gitCreate an TextGenerator object and call generate_text_chunks to generate text:
import time
from text_generator import TextGenerator
g = TextGenerator()
total = 200_000
sentences = g.generate_text_chunks(total)