StageBridge

Stochastic transition modeling for cell-state progression
in spatial and single-cell omics

Overview

StageBridge is a method for learning cell-state transitions under spatial and multimodal constraints. The framework models progression at the cell and niche level, not as patient classification.

The primary application is lung adenocarcinoma (LUAD) progression:

Normal  ──>  AAH  ──>  AIS  ──>  MIA  ──>  LUAD

The framework integrates three data modalities—10x Visium spatial transcriptomics, snRNA-seq, and whole-exome sequencing—to learn how cells transition between states, conditioned on their local microenvironment (niche) and constrained by evolutionary compatibility.

Core principles

Cell-level learning: The scientific object is cell-state transition, not patient classification
Niche conditioning: Transitions depend on local neighborhood context
Dual-reference geometry: Cells are embedded relative to healthy (HLCA) and tumor (LuCA) atlases using model-based scArches surgery
Evolutionary constraints: WES-derived features enforce biologically plausible transitions
Spatial backend agnostic: Benchmarked across Tangram, TACCO, and DestVI

Architecture

StageBridge uses a layered architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         StageBridge V1 Pipeline                             │
│                                                                             │
│  ┌─────────────┐   ┌──────────────────┐   ┌────────────────────┐           │
│  │   Layer A   │   │     Layer B      │   │      Layer C       │           │
│  │  Dual-Ref   │──>│  Local Niche     │──>│  Set Transformer   │           │
│  │   Latent    │   │  Encoder (9-tok) │   │  (ISAB/SAB/PMA)    │           │
│  └─────────────┘   └──────────────────┘   └────────────────────┘           │
│        │                                            │                       │
│        v                                            v                       │
│  ┌─────────────┐                          ┌────────────────────┐           │
│  │ HLCA + LuCA │                          │     Layer D        │           │
│  │  Reference  │                          │  Flow Matching     │           │
│  │  Alignment  │                          │  (OT-CFM)          │           │
│  └─────────────┘                          └────────────────────┘           │
│                                                     │                       │
│                    WES Features ───────────────────>│                       │
│                    (Evolutionary Constraint)        v                       │
│                                           ┌────────────────────┐           │
│                                           │  Cell Transition   │           │
│                                           │  Trajectories      │           │
│                                           └────────────────────┘           │
└─────────────────────────────────────────────────────────────────────────────┘

Local niche encoding (Layer B)

Each spatial niche is encoded as a 9-token sequence:

Token	Source	Description
Receiver	Cell identity	Target (focal) cell expression + learned state embedding
Ring 1-4	Spatial neighborhood	Cell-type composition at increasing radii
HLCA	Reference atlas	Embedding similarity to healthy lung (HLCA) reference
LuCA	Tumor atlas	Embedding similarity to disease-aware (LuCA) reference
Pathway	Gene programs	Ligand-receptor and pathway activity summary
Stats	Neighborhood	Local density, entropy, and composition statistics

Stochastic transition model (Layer D)

V1 uses Flow Matching (OT-CFM) with Sinkhorn coupling:

Learns continuous trajectories between cell states
Optimal transport provides principled coupling
Niche context conditions the flow field

Project scope

V1-Minimal (Current)

The first publication scope:

Component	Status	Description
Raw Data Pipeline	Complete	`stagebridge data-prep` orchestration
Spatial Backend Benchmark	Complete	Tangram/DestVI/TACCO/Cell2Location comparison
Dual-Reference Latent	Complete	HLCA + LuCA alignment via scArches surgery
Local Niche Encoder	Complete	Receiver-centered niche transformer
Set Transformer	Complete	ISAB/SAB/PMA hierarchy
Flow Matching	Complete	OT-CFM with Sinkhorn coupling
Evolutionary Compatibility	Complete	WES-derived constraints
Donor-Held-Out Evaluation	Complete	With uncertainty quantification

V2/V3 Roadmap (Deferred)

Non-Euclidean geometry (hyperbolic/spherical latents)
Neural SDE backend
Phase portrait / attractor decoder
Cohort transport layer
Destination-conditioned transitions (brain metastasis)

See AGENTS.md for detailed implementation plans.

Data

StageBridge integrates multi-modal data from public GEO repositories:

Dataset	Modality	GEO Accession	Role
Early LUAD snRNA-seq	Single-cell transcriptomics	GSE308103	Cell-level expression
10x Visium	Spatial transcriptomics	GSE307534	Tissue architecture
Whole-exome sequencing	WES	GSE307529	Evolutionary features

Reference atlases:

Human Lung Cell Atlas (HLCA) — healthy reference anchor
LuCA extended atlas — tumor-aware cell state reference

Spatial mapping backends:

Tangram — deep learning-based spatial mapping
TACCO — optimal transport-based annotation transfer
DestVI — variational inference deconvolution

Installation

# Clone the repository
git clone https://github.com/SecondBook5/StageBridge.git
cd StageBridge

# Create conda environment
micromamba env create -f environment.yml
micromamba activate stagebridge

# Install in development mode
pip install -e ".[all]"

# Set data root (external data directory)
export STAGEBRIDGE_DATA_ROOT=/path/to/your/data

Requirements: Python 3.11+, PyTorch 2.2+, CUDA 12.4 (cu124 recommended for HPC compatibility)

Quick start

Step 0: Data preparation

Download raw data from GEO and run the data preparation pipeline:

# Set data root
export STAGEBRIDGE_DATA_ROOT=/path/to/your/data

# Run data preparation (extracts, merges, QC filters)
stagebridge data-prep

This creates:

processed/luad_evo/snrna_merged.h5ad — merged snRNA-seq (798k cells × 18k genes)
processed/luad_evo/spatial_merged.h5ad — merged Visium spatial
processed/luad_evo/wes_features.parquet — WES-derived features
processed/luad_evo/data_prep_audit.json — processing audit report

Python API

from stagebridge.notebook_api import compose_config, run_data_prep

# Data preparation
result = run_data_prep()

# Configure training
cfg = compose_config(overrides=["model=flow_matching"])

Command line

# Data preparation
stagebridge data-prep --data-root /path/to/data

# With options
stagebridge data-prep --skip-qc --skip-normalization

Repository structure

stagebridge/
├── context_model/          # Niche encoding and set transformers
│   ├── local_niche_encoder.py       # 9-token niche transformer (Layer B)
│   ├── set_encoder.py               # ISAB, SAB, PMA (Layer C)
│   ├── lesion_set_transformer.py    # Hierarchical aggregation
│   └── prototype_bottleneck.py      # Optional compression
├── transition_model/       # Stochastic dynamics (Layer D)
│   ├── flow_matching.py             # OT-CFM implementation
│   ├── stochastic_dynamics.py       # Neural SDE (V2)
│   └── schrodinger_bridge.py        # Sinkhorn coupling
├── data/                   # Data loading and preprocessing
│   └── luad_evo/                    # LUAD progression datasets
├── pipelines/              # End-to-end workflow orchestration
│   └── run_data_prep.py             # Step 0 data pipeline
├── reference/              # HLCA/LuCA atlas alignment
├── spatial_mapping/        # Tangram, TACCO, DestVI backends
├── evaluation/             # Metrics and ablations
└── viz/                    # Publication figures

configs/                    # Hydra YAML configuration
tests/                      # Test suite
docs/                       # Documentation

HPC Deployment (Snakemake)

StageBridge uses Snakemake for HPC orchestration. Do NOT use raw sbatch scripts.

Quick Start

# Dry run (see what would execute)
snakemake -n --profile workflow/slurm

# Full run on HPC with SLURM
snakemake --profile workflow/slurm --jobs 20

# Generate DAG visualization
snakemake --dag | dot -Tpdf > dag.pdf

Configuration

Edit workflow/config.yaml or override via command line:

snakemake --profile workflow/slurm --config data_root=/your/data/path

Default paths (configured for HPC):

data_root: "/scratch/chaunzt1/stagebridge"

Required Input Files

$DATA/
├── processed/luad_evo/
│   ├── snrna_qc_normalized_with_ensg.h5ad   # snRNA with ENSG IDs
│   ├── spatial_merged.h5ad                   # Merged Visium data
│   └── wes_features.parquet                  # WES features
└── references/
    ├── hlca/
    │   ├── hlca_reference.h5ad
    │   └── hub_cache/                        # scANVI model from HuggingFace
    └── luca/
        ├── luca_core_atlas.h5ad              # Use CORE, not Extended
        └── retrained_model/scanvi_model/

Pipeline DAG

hlca_mapping ──┬──→ merge_cell_types ──→ validate_markers ──→ spatial_backend (4x)
               │                                                       │
               └──→ fuse_embeddings ←── luca_mapping                   │
                           │                                           │
                           └─────────────────────┬──────────────────────┘
                                                 ▼
                                        data_preparation
                                                 │
                                     ┌───────────┴───────────┐
                                     ▼                       ▼
                             semi_synthetic            validate_splits
                                     │                       │
                                     └───────────┬───────────┘
                                                 ▼
                                               hpo
                                                 │
                         ┌───────────────────────┼───────────────────────┐
                         ▼                       ▼                       ▼
             training (5×3=15)          baseline (4×5×3=60)          (wait)
                         │                       │                       │
                         ▼                       ▼                       │
                aggregate_cv_results      aggregate_baselines            │
                         │                       │                       │
                         └───────────┬───────────┴───────────────────────┘
                                     ▼
                         ┌───────────┴───────────┐
                         ▼                       ▼
                ablation (14x)         publication_figures

Monitoring

# Check job status
squeue -u $USER

# Watch progress
watch -n 30 'squeue -u $USER'

# Check logs
tail -f $DATA/runs/logs/*.log

See workflow/README.md for detailed documentation.

Testing

# Full test suite
pytest tests/

# Data pipeline tests
pytest tests/test_data_prep.py

# Model tests
pytest tests/test_eamist_model.py
pytest tests/test_flow_matching.py

Citation

If you use StageBridge in your research, please cite:

@article{book2026stagebridge,
  author = {Book, AJ and others},
  title = {StageBridge: Receiver-Centered Niche Modeling for Cell-State Progression in Spatial and Single-Cell Omics},
  journal = {[Journal TBD]},
  year = {2026},
  note = {Manuscript in preparation}
}

Note: Author order and citation details will be finalized upon publication.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 325 Commits
.github/workflows		.github/workflows
configs		configs
data/processed		data/processed
docs		docs
envs		envs
examples		examples
results/luca_retrain		results/luca_retrain
scripts		scripts
stagebridge		stagebridge
tests		tests
workflow		workflow
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
StageBridge_V1.ipynb		StageBridge_V1.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StageBridge

Overview

Core principles

Architecture

Local niche encoding (Layer B)

Stochastic transition model (Layer D)

Project scope

V1-Minimal (Current)

V2/V3 Roadmap (Deferred)

Data

Installation

Quick start

Step 0: Data preparation

Python API

Command line

Repository structure

HPC Deployment (Snakemake)

Quick Start

Configuration

Required Input Files

Pipeline DAG

Monitoring

Testing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StageBridge

Overview

Core principles

Architecture

Local niche encoding (Layer B)

Stochastic transition model (Layer D)

Project scope

V1-Minimal (Current)

V2/V3 Roadmap (Deferred)

Data

Installation

Quick start

Step 0: Data preparation

Python API

Command line

Repository structure

HPC Deployment (Snakemake)

Quick Start

Configuration

Required Input Files

Pipeline DAG

Monitoring

Testing

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages