SPARC: Leveraging Graph Geometry for Cold-Start Node Prediction

Official implementation for:

Leveraging Graph Geometry for Cold-Start Node Prediction

SPARC is a framework for strict cold-start node prediction, where test nodes arrive with features but without observed edges. The method learns an inductive feature-to-spectral mapping that places unseen nodes in a graph-geometry-aware space, enabling pseudo-neighborhood retrieval without test-time adjacency.

The retrieved pseudo-neighborhoods can be used as a plug-in structural context for downstream graph models, including GraphSAGE, Spectral-GCN, and NAGphormer-style graph transformers.

Overview

Most graph neural networks require neighborhood information at inference time. This assumption breaks in the strict cold-start setting, where newly arriving nodes have no observed incident edges.

SPARC addresses this by learning a parametric encoder that maps node features to a low-dimensional spectral representation approximating the graph Laplacian eigenspace. At inference time, a cold-start node is embedded from its features alone, and its pseudo-neighborhood is retrieved using k-nearest neighbors in the learned SPARC space.

Key features

Strict cold-start setting
Test nodes are unseen during training and have no observed edges at inference time.
Inductive spectral representation
SPARC learns a feature-to-spectral encoder, avoiding full-graph eigendecomposition at inference.
Pseudo-neighborhood retrieval
Cold-start nodes retrieve structurally meaningful neighbors in SPARC space.
Backbone-agnostic design
SPARC modifies the neighborhood construction step, not the downstream architecture.
Supported downstream models
- SPARC-GCN
- SPARC-SAGE
- SPARCphormer
Benchmarked datasets
Cora, Citeseer, Pubmed, Chameleon, Squirrel, Wiki-CS, Reddit, and ogbn-products.

Installation

We recommend using the provided Conda environment.

conda env create -f environment.yml
conda activate SPARC
pip install -r requirements.txt

Quick Start

The pipeline consists of three stages:

Download and preprocess the dataset.
Train the SPARC encoder.
Run a downstream cold-start classifier.

1. Download Data

cd data
python download_data.py
cd ..

Edit data/download_data.py to select the desired dataset.

2. Train SPARC Embeddings

cd SPARC/src

python main.py \
    --dataset cora \
    --seed 42 \
    --test_ratio 0.10 \
    --val_ratio 0.10 \
    --split_name random

The learned embeddings, labels, features, and masks are saved under:

SPARC/sparc_results/cora/random_test0.10_seed42/

3. Run a Downstream Model

SPARCphormer

cd ../implementations/SPARCphormer

python train.py \
    --dataset cora \
    --space spectral \
    --hops 5 \
    --split_name random \
    --test_ratio 0.10 \
    --sparc_seed 42

SPARC-SAGE

cd ../SPARC-SAGE

python -m graphsage.supervised_train \
    --cli_dataset cora \
    --cli_seed 42 \
    --cli_test_ratio 0.10 \
    --cli_split_name random \
    --sparc_topk 10

SPARC-GCN

cd ../SPARC-GCN

python train.py \
    --dataset cora \
    --use_sparc_only True \
    --epochs 75

Run each script from its corresponding directory so that relative paths are resolved correctly.

Repository layout

SPARC/
├── data/                            # Dataset download, splitting, loading
│   ├── download_data.py             # PyG/OGB → GraphSAGE format + canonical split
│   ├── split_data.py                # Random / original cold-start splits
│   ├── load_data.py                 # Loaders + inductive adjacency builder
│   └── README.md
│
├── SPARC/
│   ├── src/                         # SPARC spectral encoder (training entry point)
│   │   ├── main.py                  # CLI: data → partition → train → metrics → save
│   │   ├── SpectralNet.py           # MLP wrapper around SpectralTrainer
│   │   ├── SpectralTrainer.py       # Model, losses, training loop
│   │   ├── neighborhood_prediction.py
│   │   ├── eigenspace_diagnostics.py
│   │   ├── partition_utils.py / partition_cache.py
│   │   ├── metrics.py / utils.py
│   │   ├── config/<dataset>.json    # Per-dataset hyperparameters
│   │   └── README.md
│   │
│   ├── implementations/
│   │   ├── SPARC-GCN/               # GCN on real graph + SPARC channel  (TF 1.x compat)
│   │   ├── SPARC-SAGE/              # GraphSAGE on SPARC-kNN synthetic graph (TF 2.x compat)
│   │   └── SPARCphormer/            # Transformer on multi-hop token sequences (PyTorch)
│   │
│   └── sparc_results/<dataset>/<run>/   # Trained embeddings + masks + metrics (gitignored)
│
├── environment.yml                  # Conda env (Python 3.9 + METIS)
├── requirements.txt                 # Pinned pip dependencies
└── README.md                        # (this file)

Each component has its own README with full CLI flags and details:

Adding a new dataset

Add a downloader branch in data/download_data.py (or place GraphSAGE files manually under data/data/<name>/).
Create SPARC/src/config/<name>.json (copy an existing config and tune n_clusters and the spectral block).

Citation

If you use this code, please cite:

@article{sparc2026,
  title   = {Leveraging Graph Geometry for Cold-Start Node Prediction},
  author  = {<authors>},
  journal = {<venue>},
  year    = {2026}
}

Acknowledgments

SPARC builds on prior work in spectral graph learning and inductive representation learning, including SpectralNet, GraphSAGE, Cluster-GCN (METIS partitioning), Cold-Brew, and Graphormer-style transformers. Dataset loaders rely on PyTorch Geometric and OGB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPARC: Leveraging Graph Geometry for Cold-Start Node Prediction

Overview

Key features

Installation

Quick Start

1. Download Data

2. Train SPARC Embeddings

3. Run a Downstream Model

SPARCphormer

SPARC-SAGE

SPARC-GCN

Repository layout

Adding a new dataset

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
SPARC		SPARC
data		data
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SPARC: Leveraging Graph Geometry for Cold-Start Node Prediction

Overview

Key features

Installation

Quick Start

1. Download Data

2. Train SPARC Embeddings

3. Run a Downstream Model

SPARCphormer

SPARC-SAGE

SPARC-GCN

Repository layout

Adding a new dataset

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages