wikitext2

Here are 2 public repositories matching this topic...

Amir-Hofo / Language_Modeling_with_LSTM_models

This project explores language modeling using LSTM-based architectures trained on the WikiText-2 dataset. Two models are implemented: a standard LSTM language model and an advanced AWD-LSTM variant with regularization techniques such as weight dropout and locked dropout. Given a text prompt, both models generate coherent sentence continuations.

python nlp deep-learning text-generation language-modeling pytorch recurrent-neural-networks lstm torchtext awd-lstm wikitext2

Updated Jul 10, 2025
Jupyter Notebook

SANJAI-s0 / Wikitext_2-BPE-Tokenizer

Star

Custom BPE tokenizer built from scratch on WikiText-2 (30k vocab). Covers data cleaning, deduplication, HuggingFace tokenizers training, evaluation (compression ratio, UNK-free coverage, consistency), and save/reload as PreTrainedTokenizerFast.

python nlp jupyter-notebook language-modeling text-preprocessing byte-pair-encoding huggingface tokenizers subword-tokenization bpe-tokenizer wikitext2

Updated Apr 14, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the wikitext2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the wikitext2 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly