A 390M-parameter Mamba2 + Differential Attention hybrid language model
nlp deep-learning language-model hybrid-architecture mamba2 differential-attention pytorch-state-space-model
-
Updated
May 8, 2026 - Python