Thank you for your work!
Have you tried other methods for span representation except $CAT[h_i, h_j, D(j-i)]$? Very interesting to know :)
I tried to implement something like $CAT[h_i, h_j, MEAN(h_{i+1}, h_{i+2}, ..., h_{j-1})]$ and got terrible training speed (this was expected).
Thank you for your work!
Have you tried other methods for span representation except$CAT[h_i, h_j, D(j-i)]$ ? Very interesting to know :)
I tried to implement something like$CAT[h_i, h_j, MEAN(h_{i+1}, h_{i+2}, ..., h_{j-1})]$ and got terrible training speed (this was expected).