Skip to content
#

wikitext2

Here are 2 public repositories matching this topic...

This project explores language modeling using LSTM-based architectures trained on the WikiText-2 dataset. Two models are implemented: a standard LSTM language model and an advanced AWD-LSTM variant with regularization techniques such as weight dropout and locked dropout. Given a text prompt, both models generate coherent sentence continuations.

  • Updated Jul 10, 2025
  • Jupyter Notebook

Custom BPE tokenizer built from scratch on WikiText-2 (30k vocab). Covers data cleaning, deduplication, HuggingFace tokenizers training, evaluation (compression ratio, UNK-free coverage, consistency), and save/reload as PreTrainedTokenizerFast.

  • Updated Apr 14, 2026
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the wikitext2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the wikitext2 topic, visit your repo's landing page and select "manage topics."

Learn more