Mediation analysis is widely utilized in neuroscience to investigate the role of brain image phenotypes in the neurological pathways from genetic exposures to clinical outcomes. However, it is still difficult to conduct mediation analyses with whole genome-wide exposures and brain subcortical shape mediators due to several challenges including (i) large-scale genetic exposures, i.e., millions of single-nucleotide polymorphisms (SNPs); (ii) nonlinear Hilbert space for shape mediators; and (iii) statistical inference on the direct and indirect effects. To tackle these challenges, this paper proposes a genome-wide mediation analysis framework with brain subcortical shape mediators. First, to address the issue caused by the high dimensionality in genetic exposures, a fast genome-wide association analysis is conducted to discover potential genetic variants with significant genetic effects on the clinical outcome. Second, the square-root velocity function representations are extracted from the brain subcortical shapes, which fall in an unconstrained linear Hilbert subspace. Third, to identify the underlying causal pathways from the detected SNPs to the clinical outcome implicitly through the shape mediators, we utilize a shape mediation analysis framework consisting of a shape-on-scalar model and a scalar-on-shape model. Furthermore, the bootstrap resampling approach is adopted to investigate both global and spatial significant mediation effects. Finally, our framework is applied to the corpus callosum shape data from the Alzheimer’s Disease Neuroimaging Initiative.
The code consists of five main R and MATLAB scripts:
Step1_GWAS.R- Fast Genome-wide association study (GWAS) on cognitive outcomesStep2_MVCM_screening.m- Multivariate varying coefficient model screening on shape dataStep3_Mediation.R- Mediation analysis with bootstrap inferenceMFSDA.R- MVCM estimators adapted from MATLAB codeutilities_functions.R- Utility functions used across the analysis
- Performs fast GWAS on cognitive outcomes
- Analyzes multiple cognitive measures including ADAS11, ADAS13, CDRSB, FAQ, MMSE, and RAVLT scores
- Required packages:
statgenGWASR.matlabHardyWeinberg
- Implements multivariate varying coefficient model screening
- Processes shape data and coordinates
- Required software:
- MATLAB
- Performs mediation analysis using bootstrap methods
- Analyzes significant SNPs from previous steps
- Required packages:
R.matlabcoxedmgcvrefundpracma
statgenGWAS- For GWAS analysisR.matlab- For MATLAB file interfaceHardyWeinberg- For genetic analysiscoxed- For bias-corrected and accelerated confidence intervalspracma- For numerical analysis and matrix operationsmgcv- For generalized additive modelsrefund- For functional data analysis
- Base MATLAB installation
- Statistics and Machine Learning Toolbox (recommended)
- SNP data
- Cognitive outcome measures
- Aligned Corpus callosum shape data & functional representations
- Covariates
# Step 1: GWAS
Rscript Step1_GWAS.R <working_directory> [outcome_list]
# Step 2: MVCM screening (MATLAB)
matlab -nodisplay -nosplash -r "Step2_MVCM_screening('<data_path>', <num_bootstrap_step2>); quit"
# Step 3: Mediation analysis
Rscript Step3_Mediation.R <working_directory> <num_bootstrap_step3> [outcome_list]<working_directory>: Path containing input data files<data_path>: Path to the shape data directory (used in Step 2)<num_bootstrap_step2>: Number of bootstrap iterations for MVCM screening (default = 500)<num_bootstrap_step3>: Number of bootstrap samples for mediation analysis[outcome_list]: Optional, comma‑separated list of outcomes to analyze
Results are organized in the following directories:
results/step1_GWAS/- GWAS resultsresults/step2_MVCM/- MVCM screening resultsresults/step3_mediation/- Final mediation analysis results
