You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, we are currently trying to implement our own multivariate Chronos model. During pre-training on both the synthetic and real-world data described in the paper, we found that as training progresses, the gradient ratio between time attention and group attention becomes increasingly imbalanced (fig 1), and group attention gradually suffers from gradient vanishing. Furthermore, during inference testing, we observed that group attention has no effect (fig 2). Have you encountered this issue before, and if so, how did you resolve it?
fig1:
fig2:
the attention weights of group attention in the first layer and the tenth layer.
Above is ours, below is the open-source.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, we are currently trying to implement our own multivariate Chronos model. During pre-training on both the synthetic and real-world data described in the paper, we found that as training progresses, the gradient ratio between time attention and group attention becomes increasingly imbalanced (fig 1), and group attention gradually suffers from gradient vanishing. Furthermore, during inference testing, we observed that group attention has no effect (fig 2). Have you encountered this issue before, and if so, how did you resolve it?
fig1:

fig2:

the attention weights of group attention in the first layer and the tenth layer.
Above is ours, below is the open-source.
Beta Was this translation helpful? Give feedback.
All reactions