Official implementation for:
Leveraging Graph Geometry for Cold-Start Node Prediction
SPARC is a framework for strict cold-start node prediction, where test nodes arrive with features but without observed edges. The method learns an inductive feature-to-spectral mapping that places unseen nodes in a graph-geometry-aware space, enabling pseudo-neighborhood retrieval without test-time adjacency.
The retrieved pseudo-neighborhoods can be used as a plug-in structural context for downstream graph models, including GraphSAGE, Spectral-GCN, and NAGphormer-style graph transformers.
Most graph neural networks require neighborhood information at inference time. This assumption breaks in the strict cold-start setting, where newly arriving nodes have no observed incident edges.
SPARC addresses this by learning a parametric encoder that maps node features to a low-dimensional spectral representation approximating the graph Laplacian eigenspace. At inference time, a cold-start node is embedded from its features alone, and its pseudo-neighborhood is retrieved using k-nearest neighbors in the learned SPARC space.
-
Strict cold-start setting
Test nodes are unseen during training and have no observed edges at inference time. -
Inductive spectral representation
SPARC learns a feature-to-spectral encoder, avoiding full-graph eigendecomposition at inference. -
Pseudo-neighborhood retrieval
Cold-start nodes retrieve structurally meaningful neighbors in SPARC space. -
Backbone-agnostic design
SPARC modifies the neighborhood construction step, not the downstream architecture. -
Supported downstream models
SPARC-GCNSPARC-SAGESPARCphormer
-
Benchmarked datasets
Cora,Citeseer,Pubmed,Chameleon,Squirrel,Wiki-CS,Reddit, andogbn-products.
We recommend using the provided Conda environment.
conda env create -f environment.yml
conda activate SPARC
pip install -r requirements.txtThe pipeline consists of three stages:
- Download and preprocess the dataset.
- Train the SPARC encoder.
- Run a downstream cold-start classifier.
cd data
python download_data.py
cd ..Edit data/download_data.py to select the desired dataset.
cd SPARC/src
python main.py \
--dataset cora \
--seed 42 \
--test_ratio 0.10 \
--val_ratio 0.10 \
--split_name randomThe learned embeddings, labels, features, and masks are saved under:
SPARC/sparc_results/cora/random_test0.10_seed42/cd ../implementations/SPARCphormer
python train.py \
--dataset cora \
--space spectral \
--hops 5 \
--split_name random \
--test_ratio 0.10 \
--sparc_seed 42cd ../SPARC-SAGE
python -m graphsage.supervised_train \
--cli_dataset cora \
--cli_seed 42 \
--cli_test_ratio 0.10 \
--cli_split_name random \
--sparc_topk 10cd ../SPARC-GCN
python train.py \
--dataset cora \
--use_sparc_only True \
--epochs 75Run each script from its corresponding directory so that relative paths are resolved correctly.
SPARC/
├── data/ # Dataset download, splitting, loading
│ ├── download_data.py # PyG/OGB → GraphSAGE format + canonical split
│ ├── split_data.py # Random / original cold-start splits
│ ├── load_data.py # Loaders + inductive adjacency builder
│ └── README.md
│
├── SPARC/
│ ├── src/ # SPARC spectral encoder (training entry point)
│ │ ├── main.py # CLI: data → partition → train → metrics → save
│ │ ├── SpectralNet.py # MLP wrapper around SpectralTrainer
│ │ ├── SpectralTrainer.py # Model, losses, training loop
│ │ ├── neighborhood_prediction.py
│ │ ├── eigenspace_diagnostics.py
│ │ ├── partition_utils.py / partition_cache.py
│ │ ├── metrics.py / utils.py
│ │ ├── config/<dataset>.json # Per-dataset hyperparameters
│ │ └── README.md
│ │
│ ├── implementations/
│ │ ├── SPARC-GCN/ # GCN on real graph + SPARC channel (TF 1.x compat)
│ │ ├── SPARC-SAGE/ # GraphSAGE on SPARC-kNN synthetic graph (TF 2.x compat)
│ │ └── SPARCphormer/ # Transformer on multi-hop token sequences (PyTorch)
│ │
│ └── sparc_results/<dataset>/<run>/ # Trained embeddings + masks + metrics (gitignored)
│
├── environment.yml # Conda env (Python 3.9 + METIS)
├── requirements.txt # Pinned pip dependencies
└── README.md # (this file)
Each component has its own README with full CLI flags and details:
data/README.mdSPARC/src/README.mdSPARC/implementations/SPARC-GCN/README.mdSPARC/implementations/SPARC-SAGE/README.mdSPARC/implementations/SPARCphormer/README.md
- Add a downloader branch in
data/download_data.py(or place GraphSAGE files manually underdata/data/<name>/). - Create
SPARC/src/config/<name>.json(copy an existing config and tunen_clustersand thespectralblock).
If you use this code, please cite:
@article{sparc2026,
title = {Leveraging Graph Geometry for Cold-Start Node Prediction},
author = {<authors>},
journal = {<venue>},
year = {2026}
}SPARC builds on prior work in spectral graph learning and inductive representation learning, including SpectralNet, GraphSAGE, Cluster-GCN (METIS partitioning), Cold-Brew, and Graphormer-style transformers. Dataset loaders rely on PyTorch Geometric and OGB.