This is a PyTorch implementation of the Transformer architecture described in the paper Attention Is All You Need.
The implemented architecture is intended specifically for tackling Sequence to Sequence tasks.
To improve the readability of matrix products inside the Transformer architecture, this implementation makes use of the Pytorch Named Tensor API to manipulate Tensor dimensions.