Minimal PyTorch implementation of the Transformers architecture from Vaswani et al.'s 2017 paper, Attention Is All You Need. This repository serves as a deep dive into understanding key architectures and their nuances, including architecture design and training techniques.
Tokenizer.pyModules/Attention.pyMultiHeadAttentionBaseMultiHeadSelfAttentionMultiHeadCrossAttention
AddNorm.pyMLP.pyEncoder.pyEncoderLayerEncoder
Decoder.pyDecoderLayerDecoder
Transformer.pyTransformer
Config/Config.py
BERT/BERT.pyBert