You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mechanistic Interpretability (MI) is a subfield of AI alignment and safety research focused on reverse-engineering neural networks to understand their internal computational mechanisms by discovering the actual algorithms and circuits they learn.