GenCPM

A Toolbox for Generalized Connectome-based Predictive Modeling

Abstract

Understanding brain–behavior relationships and predicting cognitive and clinical outcomes from neuromarkers are central tasks in neuroscience. Connectome-based Predictive Modeling (CPM) has been widely adopted to predict behavioral traits from brain connectivity data; however, existing implementations are largely restricted to continuous outcomes, often overlook essential non-imaging covariates, and are difficult to apply in clinical or disease cohort settings. To address these limitations, we present GenCPM, a generalized CPM framework implemented in open-source R software. GenCPM extends traditional CPM by supporting binary, categorical, and time-to-event outcomes and allows the integration of covariates such as demographic and genetic information, thereby improving predictive accuracy and interpretability. To handle high-dimensional data, GenCPM incorporates marginal screening and regularized regression techniques, including LASSO, ridge, and elastic net, for efficient selection of informative brain connections. We demonstrate the utility of GenCPM through analyses of the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease (A4) Study and the Alzheimer’s Disease Neuroimaging Initiative (ADNI), showing enhanced predictive performance and improved signal attribution compared to standard methods. GenCPM offers a flexible, scalable, and interpretable solution for predictive modeling in brain connectivity research, supporting broader applications in cognitive and clinical neuroscience.

Overview of the GenCPM toolbox

Fig. 1 Each subject provides a connectivity matrix and an outcome variable (e.g., a behavior or clinical measure). (a). In the original GenCPM framework, marginal screening is applied to select the top $K$ significant edges based on a predefined threshold $p$. The selected edges are separated into positively and negatively correlated sets, and summary measures, computed as sums of the average connectivity strength within each set, are derived for each subject. These connectivity-derived predictors are then combined with optional non-imaging covariates and entered into downstream models, including linear, logistic, multinomial, and CoxPH regression. (b). In the penalized GenCPM variant, the full set of selected edge features is retained without aggregation, allowing the model to capture fine-grained individual edge-level contributions. The resulting feature matrix, combined with non-imaging covariates, is then input into regularized regression models such as LASSO, ridge, and elastic net, where penalization is applied only to edge features, for joint modeling and feature selection while preserving the contribution of non-imaging covariates.

Installation

You can install GenCPM from github with:

library(devtools)
install_github("BXU69/GenCPM")
library(GenCPM)

Tutorial

1. `train.GenCPM`

The train.GenCPM is an intermediate function for training models that will be put into linear.GenCPM, logit.GenCPM, and multinom.GenCPM, and doesn't directly output predictions using test data, so we don't use it to make any prediction.

2. `.GenCPM`

The linear.GenCPM, logit.GenCPM, multinom.GenCPM, and cox.GenCPM are four main functions to fit models and output predictions for respectively continuous, binary, categorical, and survival responses using Connectome-based Predicted Modeling.

`linear.GenCPM`, `logit.GenCPM`, `multinom.GenCPM`

linear.GenCPM(
    connectome, behavior, x=NULL, 
    external.connectome = NULL, external.x = NULL,
    cv="leave-one-out", k = dim(connectome)[3], correlation = "pearson", 
    thresh = .01, edge = "separate", seed = 1220
)

logit.GenCPM(
    connectome, behavior, x=NULL, 
    external.connectome = NULL, external.x = NULL,
    cv="leave-one-out", k = dim(connectome)[3], correlation = "pearson", 
    thresh = .01, edge = "separate", seed = 1220
)

multinom.GenCPM(
    connectome, behavior, x=NULL, 
    external.connectome = NULL, external.x = NULL,
    cv="leave-one-out", k = dim(connectome)[3], correlation = "pearson", 
    thresh = .01, edge = "separate", seed = 1220
)

Arguments

connectome: an array indicating the connectivity between M edges and over N subjects. The dimension should be M*M*N.
behavior: a vector containing the behavior measure for all subjects.
x: a data frame containing the non-image variables in the model.
external.connectome: an external array indicating the connectivity for prediction.
external.x: an external data frame containing the non-image variables for prediction.
cv: a character indicating the method of cross-validation. The default method is "leave-one-out".
k: a parameter used to set the number of folds for k-fold cross-validation.
correlation: the method for finding the correlation between edge and behavior. The default is "pearson". Alternative approaches are "spearman" and "kendall".
thresh: the value of the threshold for selecting significantly related edges. The default value is .01.
edge: a character indicating the model is fitted with either positive and negative edges respectively or combined edges together. The default is "separate".
seed: the value used to set seed for random sampling in the process of cross-validation. The default value is 1220.

Output

The functions will return a list containing the following output:

positive_edges: all selected edges having a significantly positive relationship with behavior response.
negative_edges: all selected edges having a significantly negative relationship with behavior response.
r_mat: a list of matrices consisting of Pearson correlation coefficient between edges and behavior.
p_mat: a list of matrices consisting p-value from Pearson correlation between edges and behavior.
positive_model: Fitted model using positive edges. Only applicable when users input external.connectome.
negative_model: Fitted model using negative edges. Only applicable when users input external.connectome.
combined_model: Fitted model using both positive and negative edges. Only applicable when users input external.connectome.
positive_predicted_behavior: predicted behaviors from the model fitted using positive edges separately. Not applicable when the argument edge = "combined".
negative_predicted_behavior: predicted behaviors from the model fitted using negative edges separately. Not applicable when the argument edge = "combined".
predicted_behavior: predicted behaviors from the model fitted using all edges. Not applicable when the argument edge = "separate".
actual_behavior: Actual values of behavior response.

`cox.GenCPM`

cox.GenCPM(
    connectome, x=NULL, time, status,
    external.connectome = NULL, external.x = NULL,
    cv="leave-one-out", k = dim(connectome)[3], 
    thresh = .01, edge="separate", seed = 1220
)

Arguments

connectome: an array indicating the connectivity between M edges and over N subjects. The dimension should be M*M*N.
x: a data frame containing the non-image variables in the model.
time: the follow-up time for all individuals.
status: the status indicator, normally 0=alive and 1=event.
external.connectome: an external array indicating the connectivity for prediction.
external.x: an external data frame containing the non-image variables for prediction.
cv: a character indicating the method of cross-validation. The default method is "leave-one-out".
k: a parameter used to set the number of folds for k-fold cross-validation.
thresh: the value of the threshold for selecting significantly related edges. The default value is .01.
edge: a character indicating the model is fitted with either positive and negative edges respectively or combined edges together. The default is separate.
seed: the value used to set seed for random sampling in the process of cross-validation. The default value is 1220.

Output

positive_edges: all selected edges having a significantly positive relationship with survival outcome in a marginal test.
negative_edges: all selected edges having a significantly negative relationship with survival outcome in a marginal test.
positive_model: Fitted model using positive edges. Only applicable when users input external.connectome.
negative_model: Fitted model using negative edges. Only applicable when users input external.connectome.
combined_model: Fitted model using both positive and negative edges. Only applicable when users input external.connectome.
positive_predicted_linear_predictor: predicted linear predictors from the Cox model fitted using positive edges separately. Not applicable when the argument edge = "combined".
negative_predicted_linear_predictor: predicted linear predictors from the Cox model fitted using negative edges separately. Not applicable when the argument edge = "combined".
predicted_linear_predictor: predicted linear predictors from the Cox model fitted using all edges. Not applicable when the argument edge = "separate".
actual_time: Actual values of survival time.
actual_status: Actual values of status indicator.

3. `.regularized.GenCPM`

The linear.regularized.GenCPM, logit.regularized.GenCPM, multinom.regularized.GenCPM, and cox.regularized.GenCPM are four penalized-version functions of four .GenCPM functions by introducing LASSO, ridge, or elastic-net regularization.

`linear.GenCPM`, `logit.GenCPM`, `multinom.GenCPM`

linear.regularized.GenCPM(
    connectome, behavior, x, 
    external.connectome = NULL, external.x = NULL,
    cv="leave-one-out", k=dim(connectome)[3], correlation = "pearson",
    thresh=.01, edge="separate", type="lasso", 
    lambda=NULL, alpha=NULL, seed=1220
)

logit.regularized.GenCPM(
    connectome, behavior, x, 
    external.connectome = NULL, external.x = NULL,
    cv="leave-one-out", k=dim(connectome)[3], correlation = "pearson",
    thresh=.01, edge="separate", type="lasso", 
    lambda=NULL, alpha=NULL, seed=1220
)

multinom.regularized.GenCPM(
    connectome, behavior, x, 
    external.connectome = NULL, external.x = NULL,
    cv="leave-one-out", k = dim(connectome)[3], correlation = "pearson",
    thresh = .01, edge = "separate", type="lasso", 
    lambda=NULL, alpha=NULL, seed = 1220
)

Arguments

connectome: an array indicating the connectivity between M edges and over N subjects. The dimension should be M*M*N.
behavior: a vector containing the behavior measure for all subjects.
x: a data frame containing the non-image variables in the model.
external.connectome: an external array indicating the connectivity for prediction.
external.x: an external data frame containing the non-image variables for prediction.
cv: a character indicating the method of cross-validation. The default method is "leave-one-out".
k: a parameter used to set the number of folds for k-fold cross-validation.
correlation: the method for finding the correlation between edge and behavior. The default is "pearson". Alternative approaches are "spearman" and "kendall".
thresh: the value of the threshold for selecting significantly related edges. The default value is .01.
edge: a character indicating the model is fitted with either positive and negative edges respectively or combined edges together. The default is "separate".
type: type of penalty. The default is "lasso".
lambda: the value of penalty.
alpha: the alpha for elastic net penalty.
seed: the value used to set seed for random sampling in the process of cross-validation. The default value is 1220.

Output

positive_edges: all selected edges having a significantly positive relationship with behavior response.
negative_edges: all selected edges having a significantly negative relationship with behavior response.
positive_model: Fitted model using positive edges. Only applicable when users input external.connectome.
negative_model: Fitted model using negative edges. Only applicable when users input external.connectome.
combined_model: Fitted model using both positive and negative edges. Only applicable when users input external.connectome.
positive_predicted_behavior: predicted behaviors from the model fitted using positive edges separately. Not applicable when the argument edge = "combined".
negative_predicted_behavior: predicted behaviors from the model fitted using negative edges separately. Not applicable when the argument edge = "combined".
predicted_behavior: predicted behaviors from the model fitted using all edges. Not applicable when the argument edge = "separate".
actual_behavior: actual values of behavior response.
positive_lambda_total: the final lambda indicating penalty used in the model fitted with positive edges separately for each fold during cross-validation. Not applicable when edge = "combined.
negative_lambda_total: the final lambda indicating penalty used in the model fitted with negative edges separately for each fold during cross-validation. Not applicable when edge = "combined.
lambda_total: the final lambda indicating penalty used in the model fitted with all edges for each fold during cross-validation. Not applicable when edge = "separate".

`cox.regularized.GenCPM`

cox.regularized.GenCPM(
    connectome, x=NULL, time, status,
    cv="leave-one-out", k = dim(connectome)[3], thresh = .01, 
    edge="separate", type="lasso", lambda=NULL, alpha=NULL, seed = 1220
)

Arguments

connectome: an array indicating the connectivity between M edges and over N subjects. The dimension should be M*M*N.
x: a data frame containing the non-image variables in the model.
time: the follow-up time for all individuals.
status: the status indicator, normally 0=alive and 1=event.
external.connectome: an external array indicating the connectivity for prediction.
external.x: an external data frame containing the non-image variables for prediction.
cv: a character indicating the method of cross-validation. The default method is "leave-one-out".
k: a parameter used to set the number of folds for k-fold cross-validation.
correlation: the method for finding the correlation between edge and behavior. The default is "pearson". Alternative approaches are "spearman" and "kendall".
thresh: the value of the threshold for selecting significantly related edges. The default value is .01.
edge: a character indicating the model is fitted with either positive and negative edges respectively or combined edges together. The default is "separate".
type: type of penalty. The default is "lasso".
lambda: the value of penalty.
alpha: the alpha for elastic net penalty.
seed: the value used to set seed for random sampling in the process of cross-validation. The default value is 1220.

Output

positive_edges: all selected edges having a significantly positive relationship with survival outcome in a marginal test.
negative_edges: all selected edges having a significantly negative relationship with survival outcome in a marginal test.
positive_model: Fitted model using positive edges. Only applicable when users input external.connectome.
negative_model: Fitted model using negative edges. Only applicable when users input external.connectome.
combined_model: Fitted model using both positive and negative edges. Only applicable when users input external.connectome.
positive_predicted_linear_predictor: predicted linear predictor from the model fitted using positive edges separately. Not applicable when the argument edge = "combined".
negative_predicted_linear_predictor: predicted linear predictor from the model fitted using negative edges separately. Not applicable when the argument edge = "combined".
predicted_linear_predictor: predicted linear predictor from the model fitted using all edges. Not applicable when the argument edge = "separate".
actual_status: actual values of status indicator.
actual_time: actual values of survival time.
positive_lambda_total: the final lambda indicating penalty used in the model fitted with positive edges separately for each fold during cross-validation. Not applicable when edge = "combined.
negative_lambda_total: the final lambda indicating penalty used in the model fitted with negative edges separately for each fold during cross-validation. Not applicable when edge = "combined.
lambda_total: the final lambda indicating penalty used in the model fitted with all edges for each fold during cross-validation. Not applicable when edge = "separate".

4. `assess.GenCPM`

The assess.GenCPM is the function to assess the model performance across testing folds with varying types of metrics based on the specific model type.

assess.GenCPM(
    object, model="linear", edge="separate"
)

Arguments

object: returned GenCPM object from .GenCPM or .regularized.GenCPM functions.
model: a character string representing one of the built-in regression models. "linear" for linear.GenCPM and linear.regularized.GenCPM; "logistic" for logit.GenCPM and logit.regularized.GenCPM; "multinom"" for multinom.GenCPM and multinom.regularized.GenCPM; and "cox" for cox.GenCPM and cox.regularized.GenCPM. The default is "linear".
edge: usage of edges to fit models, and it should be decided by the edge usage in the "object" input. "seperate" for fitting two separate models using positive edges and negative edges respectively, and "combined" for fitting only one model use all edges selected. The default is "separate". The function will report error if the edge is not correctly specified.

Output

The output of assess.GenCPM is a list contains metrics assessing the model performance (MSE, AUC, multi-class AUC, and C-index, based on the specific model type), predicted response and actual response.

5. `heatmap.GenCPM`

The heatmap.GenCPM is the function to visualize the slected edges either from thresholding or regularization based on the 10-node network label in Shen268 atlas.

heatmap.GenCPM(
    cpm, foldThreshold = .5
)

Arguments

cpm: returned GenCPM object from .GenCPM or .regularized.GenCPM functions.
foldThreshold: the edges selected for over this many folds will be plotted. If set to .5, the edges selected at least half of the time are plotted.

Output

The output of heatmap.GenCPM is a heatmap demonstrating the strength of the correlation between connectivity and response by the shade of the color, with red representing a positive correlation and blue representing a negative correlation.

Example

We generate simulation data as the input data to illustrate how to use some functions of this package. The following example is the tutorial of the linear.GenCPM. The usage of other model fitting functions is the same case.

First, we generate connectome, which is a 268*268*500 array, and the behavior response y as follows. 268 is the preference dimension as the heatmap plotting selected edges uses 10-node label from Shen268 atlas.

set.seed(123)
N <- 500 # 500 individuals
M <- 268 # 268 edges
connectome <- array(0, dim = c(M, M, N)) # initialize the 3D array to store the connectivity matrix
edge <- matrix(NA, nrow = N, ncol = (M+1)*M/2) # to store the upper-triangle part of the matrix
  
index <- c(1:((M+1)*M/2)) 
pos_ind <- sample(index, ((M+1)*M/2)/3, replace = F) # randomly sample 1/3 edges to be positively correlated with the response
neg_ind <- sample(index[-pos_ind], ((M+1)*M/2)/3, replace = F) # randomly sample 1/3 edges to be negative correlated with the response

for (i in 1:N) {
  
  mat <- matrix(runif(M*M, min = -1, max = 1), nrow = M) # generate random connectivity matrix
  sym_mat <- (mat + t(mat)) / 2   # make the matrix symmetric
  diag(sym_mat) <- 1   # set the diagonal to 1
  connectome[, , i] <- sym_mat
  edge[i,] <- sym_mat[upper.tri(sym_mat, diag=T)]
  
}

corr <- rep(0, (M+1)*M/2) # set correlation to be 0 for those edges not be selected
corr[pos_ind] <- 0.8 # set correlation to be 0.8 for those edges selected to be positive
corr[neg_ind] <- -0.8 # set correlation to be 0.8 for those edges selected to be negative
epsilon <- rnorm(N) # generate error term
y <- edge %*% corr + epsilon # generate response `y`

Then, the simulation data are put into the linear.GenCPM to fit a linear regression model and make the prediction. We don't include non-image covariate x in this example and keep other settings as default.

lm.fit <- linear.GenCPM(connectome, y)

The next step you may want to do is to assess the prediction by the assess.GenCPM.

assess.GenCPM(lm.fit, model = "linear", edge = "separate")

Pay attention that you should specify the edge correctly, which is decided by the model you fitted with linear.GenCPM, otherwise it will report error. Also, remember to change the parameter model when switching to fit a logistic, multinomial logistic, or Cox model.

Finally, we can visualize the significant edges identified by GenCPM in a heatmap.

heatmap.GenCPM(lm.fit, foldThreshold = 0.8)

foldThreshold = 0.8 means that the edges selected for over 80% folds will be plotted. You can tune this parameter according to your need.

Citation

Please cite the paper when you use the GenCPM package:

Xu B, Ding S, Xu W, Fredericks C and Zhao Y (2025) GenCPM: a toolbox for generalized connectome-based predictive modeling. Front. Neurosci. 19:1627497. doi: 10.3389/fnins.2025.1627497

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
R		R
fig		fig
man		man
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
GenCPM.Rproj		GenCPM.Rproj
NAMESPACE		NAMESPACE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

GenCPM

Abstract

Overview of the GenCPM toolbox

Installation

Tutorial

1. train.GenCPM

2. .GenCPM

linear.GenCPM, logit.GenCPM, multinom.GenCPM

Arguments

Output

cox.GenCPM

Arguments

Output

3. .regularized.GenCPM

linear.GenCPM, logit.GenCPM, multinom.GenCPM

Arguments

Output

cox.regularized.GenCPM

Arguments

Output

4. assess.GenCPM

Arguments

Output

5. heatmap.GenCPM

Arguments

Output

Example

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. `train.GenCPM`

2. `.GenCPM`

`linear.GenCPM`, `logit.GenCPM`, `multinom.GenCPM`

`cox.GenCPM`

3. `.regularized.GenCPM`

`linear.GenCPM`, `logit.GenCPM`, `multinom.GenCPM`

`cox.regularized.GenCPM`

4. `assess.GenCPM`

5. `heatmap.GenCPM`

Packages