GitHub - Sh1384/GNN_SAE: Interpreting GNN Activations of Graphical Motifs through SAEs

Graph neural networks excel at learning from graph-structured data, yet whether their internal representations align with known topological motifs remains unclear. We apply sparse autoencoders to decompose GNN hidden activations trained on synthetic graphs with ground-truth motif annotations, including feedback loops, cascades, and fan-out structures. Using point-biserial correlation with rigorous permutation testing, we discover that GNNs spontaneously learn monosemantic features corresponding to specific graph motifs. Causal ablation experiments confirm that identified features are functionally necessary and removing feedback loop features selectively degrades performance only on graphs containing those structures. Interestingly, single input module motifs were also causally linked to feedback loops, suggesting that these two motifs might not be mutually exclusive. This work establishes that mechanistic interpretability of graph representations is achievable and demonstrates that topological inductive biases critically determine the structure of learned topological encodings.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
__pycache__		__pycache__
ablations		ablations
checkpoints		checkpoints
outputs		outputs
virtual_graphs		virtual_graphs
.DS_Store		.DS_Store
README.md		README.md
analysis_notebook.ipynb		analysis_notebook.ipynb
check_gpu_utilization.py		check_gpu_utilization.py
compare_sae_configs.py		compare_sae_configs.py
generate_mixed_motif_activations.py		generate_mixed_motif_activations.py
gnn_train_copy.py		gnn_train_copy.py
hyperparameter_sweep_distributed.py		hyperparameter_sweep_distributed.py
hyperparameter_sweep_multi_gpu.py		hyperparameter_sweep_multi_gpu.py
interpretability_analysis.py		interpretability_analysis.py
plot_random_control_distributions.py		plot_random_control_distributions.py
run_ablation.py		run_ablation.py
run_batch_ablations.py		run_batch_ablations.py
run_interpretability_experiments.py		run_interpretability_experiments.py
run_multi_gpu_sweep.sh		run_multi_gpu_sweep.sh
sae_activations_motif_new.ipynb		sae_activations_motif_new.ipynb
sparse_autoencoder.py		sparse_autoencoder.py
visualize_sweep_results.py		visualize_sweep_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages