Replies: 1 comment
-
|
Additive attention masking it is. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am puzzled over the impl of masked attention, especially for grouped attention. Aren't we supposed to completely disable interaction among time series that do not belong to the same group? How do you achieve that with addition of mask values? Shouldn't we use multiplication by 0?
Beta Was this translation helpful? Give feedback.
All reactions