Hey, really appreciate your nice work!
I notice in the trme.py that for the global cls position embedding, the code uses self.type_embeds = nn.Embedding(100, self.dim)(line 33). However, the latter utilisation (line 155) pos = self.type_embeds(torch.arange(0, 3, device=device)) only uses three positions. So why type_embeds uses nn.Embedding(100, self.dim) instead of nn.Embedding(3, self.dim)? Will this make a difference?
Looking forward to your reply! Thanks a lot!