Skip to content

DialecticalHTR/TextGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dialectological text generator

TextGenerator downloads articles from Wikipedia and converts them to dialectological text format.

What is a dialectological text?

Linguists at NaRFU go on dialectological expeditions to different villages of Arkhangelsk region. The dialogs with locals are transcribed into notebooks and the examples of a dialect words and an example of its usage is written on cards. The dialectological text is a text that conveys linguistic features using special symbols like acutes, apostrophes etc.

Dialectological text: [ста̄ну́шку к‿руба́х'и пр'ишыва́л'и]

Example of a card

Installation

  1. Create an virtual environment (Optional)
python -m venv venv
# for windows:
./venv/Scripts/activate.ps1
# for linux:
source ./venv/bin/activate
  1. Install package from GitHub
pip install git+https://github.com/DialecticalHTR/AnnotationExporter.git

Usage

Create an TextGenerator object and call generate_text_chunks to generate text:

import time
from text_generator import TextGenerator

g = TextGenerator()

total = 200_000
sentences = g.generate_text_chunks(total)

About

Generator of dialectological text for syntetic datasets

Resources

Stars

Watchers

Forks

Languages