Skip to content
/ RARity Public

A method to estimate rare variant heritability. Its versatility lies in the ability to estimate both gene-level and exome-wide heritability, in a fast, accurate, and unbiased manner.

License

Notifications You must be signed in to change notification settings

GMELab/RARity

Repository files navigation

logo

Rare variant heritability (RARity) estimator is a framework to assess rare variant heritability (h2RV) without assuming a particular genetic architecture, in a fast, accurate, and unbiased manner. It enables computation of both gene-level and exome-wide heritability estimates of continuous traits.

Table of Contents

Method Overview

The RARity method entails parallel computing of the adjusted R2 based on an ordinary least square (OLS) multiple linear regression as an unbiased estimator of block-wise heritability for each consecutive genetic block. Adjusted R2 estimates are then summed over all blocks as the overall heritability estimate of a trait.

To reduce the influence of long-range LD between variants that would otherwise inflate the overall heritability estimate, highly correlated RVs should be removed using PLINK 1.9, by LD pruning with a Pearson’s r2 threshold > 0.1 and, within a window of 50Mb that are shifted by 500 bases at the end of each step.

The following figure provides a summary of the RARity pipeline:

RARity pipeline

Figure 1: Summary of the Rare variant heritability (RARity) estimator pipeline.

The RARity pipeline constitutes pre-treatment of the genotype and the phenotype data, followed by application of the statistical model to each block, and finally estimation of the total heritability from all blocks. *The model may be modified to prioritize variants by implementing LD clumping instead of pruning. **Adjustment of medications may involve implementing correction factors or removal of individuals using the medications. MAF = minor allele frequency, MAC = minor allele count, LD = linkage disequilibrium, SD = standard deviation, PCs = principal components.

Demo Requirements

Hardware Requirements (Full-scale version)

Any standard computer (macOS, Linux, Windows).

Requires a unix-like virtual environment supporting a minimum of 250GB RAM space for in-memory operations.

Software Requirements

Essential Dependencies: programs

Program Description Download
BASH (≥ 5.0) a unix shell and command language https://ubuntu.com/download/desktop
R (≥ 3.6.0) or newer R programming language https://cran.r-project.org/

Essential Dependencies: R packages

R package Install Reference
dplyr install.packages("dplyr") https://www.r-project.org/nosvn/pandoc/dplyr.html
data.table install.packages("data.table") https://cran.r-project.org/package=data.table

Demo (with instructions for use)

The following is a demonstration of the RARity algorithm to estimate rare coding variant heritability.

The algorithm uses the rarity_demo.r script from the RARity repository. The R script that outputs the RV heritability for each genotype block, as well as the overall heritability estimates, based on all genotype blocks.

Input files

For demonstrative and efficiency purposes, we will use the following simulated data as input.

  1. 5 Genotype matrices: geno_block1.RData, geno_block2.RData, geno_block3.RData, geno_block4.RData and geno_block5.RData, each, with a dimension of 10,000 individuals x 1,000 RVs. Assume that the genotype matrices are pre-processed, as per figure 1 and standardized to mean=0, sd=1.

  2. 1 Phenotype matrix: pheno.Rdata, with a dimension of 10,000 individuals x 3 phenotypes. Assume that the phenotypes are pre-processed as per figure1 above and standardized to mean=0, sd =1.

Steps

Step 1: Create the following working directory in bash:

mkdir rarity_practice

Step 2: Download the “rarity_demo.r” as well as the input data (5 genotype block matrices and 1 phenotype matrix) to rarity_practice directory.

Step 3: In bash, tell the system that it has permission to execute the scripts:

chmod +x /your_directory/rarity_practice/\*

Step 4: Rscript requires the following arguments:

File pattern to identify the genotype block data= “geno_block”

pheno_file=“phenos.RData”

Run the script in bash as such:

cd your_directory/rarity_practice

Rscript ./rarity_demo.r geno_block phenos.RData

Run time: The run time for this script on a standard computer should be around 10 seconds.

Sample Output

A successful output will produce:

  1. A single file, BLOCK_HERITABILITY.txt computing RV heritability for each block.

Here is an example of the fields produced and their meaning:

phenotype block N N_RV r2 adj_r2 block_r2_variance block_adj_r2_variance
pheno_1 geno_block1 10000 1000 0.106309 0.007 2.75E-05 3.39E-05
pheno_2 geno_block1 10000 1000 0.104668 0.005175 2.72E-05 3.35E-05
pheno_3 geno_block1 10000 1000 0.099288 -0.0008 2.61E-05 3.22E-05

Acronym descriptions in output: N=number of individuals; N_RV= number of rare variants; adj_r2= adjusted R2; block_r2_variance= variance of R2 in each block; Block_adj_r2_variance= Variance of adjusted R2 in each block

  1. A single file, TOTAL_HERITABILITY.txt showing the total heritability for each trait. The expected output of this file
phenotype N N_RV Heritability LCL UCL
pheno_1 10000 5000 0.005413 -0.01962 0.030442
pheno_2 10000 5000 0.009938 -0.01517 0.035046
pheno_3 10000 5000 0.008781 -0.01631 0.033873

Acronym descriptions in output: N=number of individuals; N_RV= number of rare variants; LCL= Lower confidence level; UCL=Upper confidence level

License

GNU General Public License v3.0

Citation

Nature Communications

Pathan, N., Deng, W.Q., Di Scipio, M. et al. A method to estimate the contribution of rare coding variants to complex trait heritability. Nat Commun 15, 1245 (2024). https://doi.org/10.1038/s41467-024-45407-8

Contact Information

Any queries pertaining to RARity scripts or methodological framework can be addressed to either: Nazia Pathan (pathann@mcmaster.ca) or Guillaume Pare (pareg@mcmaster.ca)

About

A method to estimate rare variant heritability. Its versatility lies in the ability to estimate both gene-level and exome-wide heritability, in a fast, accurate, and unbiased manner.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages