An educational Python project for learning tokenization step by step by building character-level, byte-level, and BPE tokenizers from scratch.
python nlp tokenizer text-processing tokenization educational-project bpe byte-pair-encoding regextokenizer llm subword-tokenization bpe-tokenizer byte-pair-tokenizer char-tokeneizer
-
Updated
May 4, 2026 - Python