NeurIPS 25 Poster:
- Python ≥ 3.8.16
- See virtual environment setup scripts:
- Development:
./.ci/install-dev.sh - Production:
./.ci/install.sh
- Development:
- Dependency lists:
Run each IGNN variant across all datasets using the following scripts on:
- Public splits:
./scripts/00-best-racIGNN-public.sh - Our splits:
| Variant | Script |
|---|---|
| c-IGNN | ./scripts/01-best-cIGNN.sh |
| r-IGNN | ./scripts/02-best-rIGNN.sh |
| a-IGNN | ./scripts/03-best-aIGNN.sh |
- Results of the public and our splits are documentd in
./results/table_pub.csvand./results/table_our.csv.
Important
Experimental setups for our reported results:
- [Setting 1] Tesla V100, with Python 3.9.15, PyTorch 2.0.1, and Cuda 11.7.
We observed performance discrepancies when using identical hyperparameters across different PyTorch/CUDA versions and GPU architectures, e.g., V100 (Setting 1) vs. RTX 3090 (Setting 2).
- [Setting 2] RTX 3090, with Python 3.8.16, PyTorch 2.1.2, and Cuda 12.1.
For instance, on Chameleon, the same config in ./scripts/01-best-cIGNN.sh yielded 50.79 ± 4.92 (Setting 1) vs. 47.53 ± 3.36 (Setting 2).
Although SOTA performance can be achieved under all environments with proper tuning, optimal hyperparameters may differ across setups.
Perform hyperparameter searches for each variant using:
| Split | Search Script |
|---|---|
| Ours | ./scripts/00-search-ours-split.sh |
| Public | ./scripts/00-search-public-split.sh |
| Variant | Script |
|---|---|
| c-IGNN | ./scripts/01cignn_search.py |
| r-IGNN | ./scripts02rignn_search.py |
| a-IGNN | ./scripts/03aignn_search.py |
Tip
We strongly recommend performing your own hyperparameter search to achieve the best performance in your environment using the above provided search scripts.
We use the open-source pip package graph_datasets for unified data loading:
$ python -m pip install graph_datasetsExample usage:
from graph_datasets import load_data
from configs import DataConf
from utils import read_configs
DATA_INFO = DataConf(**read_configs("data"))
data = load_data(
dataset_name='squirrel',
source='critical',
directory=DATA_INFO.DATA_DIR,
row_normalize=True,
rm_self_loop=False,
add_self_loop=True,
verbosity=1,
return_type="pyg",
)To minimize variance from inconsistent split policies across datasets, we use a unified 10× random split scheme with a 48%/32%/20% train/validation/test ratio.
- For medium-size datasets, the splits are stored in
./data/random_splits/fixed_splits/. - For large datasets,
OGB-arxivandOGB_productsare using the public splits withpokecusing the splits from this work. - All splits can be loaded via:
from utils import get_splits
# `repeat` is the number of our/public splits
# `i` is the index of the selected split
train_mask, val_mask, test_mask = get_splits(
data=data,
name=data.name,
n_nodes=data.num_nodes,
i=1,
repeat=10,
TRAIN_RATIO=48,
VALID_RATIO=32,
DATA=DATA_INFO,
public=False,
)The code for all 30 baselines is in ./benchmark/baselines:
- If a baseline has its own folder, a
search.pyscript is included for hyperparameter tuning withoptuna. See theREADME.mdin the folder for details. - If a baseline does not have its own folder, it can be run with a script like
./baselines.py, which can conveniently derive the correspondingsearch.pyscript. - All search spaces used in the experiments are documented in
./configs/search_grid.py.
- The code for the empirical analysis is documented in
./scripts/dml.sh. - Run the analysis and draw the analysis plots via:
$ bash scripts/dml.sh squirrel critical False 10
$ python -u -m scripts.mergeIf you find this work useful, please cite our paper:
@inproceedings{ignn,
title={Making Classic {GNN}s Strong Baselines Across Varying Homophily: A Smoothness{\textendash}Generalization Perspective},
author={Ming Gu and Zhuonan Zheng and Sheng Zhou and Meihan Liu and Jiawei Chen and Qiaoyu Tan and Liangcheng Li and Jiajun Bu},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=IAGbhDARZd}
}