I have a few queries regarding the CausalSelfAttention module in the codebase.
- In lines https://github.com/songweige/TATS/blob/main/tats/modules/gpt.py#L100C13-L102C86,
mask[:, :config.n_unmasked+1] = 1
mask[:, -config.n_unmasked+1:] = 1
mask[-config.n_unmasked+1:, config.n_unmasked+1:-config.n_unmasked+1] = 0
The masking seems to be incorrect. I believe the corrected code should be -
mask[:, :config.n_unmasked+1] = 1
mask[:, -(config.n_unmasked+1):] = 1
mask[-(config.n_unmasked+1):, config.n_unmasked+1:-(config.n_unmasked+1)] = 0
- In lines https://github.com/songweige/TATS/blob/main/tats/modules/gpt.py#L122C9-L123C76,
if layer_past is None:
att = att.masked_fill(self.mask[:,:,:T,:T] == 0, float('-inf'))
We are only masking when layer_past is None. But when it is not, no masking is applied which would imply we are not performing causal attention anymore. Why is that the case?
I have a few queries regarding the CausalSelfAttention module in the codebase.
The masking seems to be incorrect. I believe the corrected code should be -
We are only masking when layer_past is None. But when it is not, no masking is applied which would imply we are not performing causal attention anymore. Why is that the case?