Skip to content

wangbo17/ChromCall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChromCall

Assigning chromatin status to predefined genomic regions from epigenomic profiling data


🔍 Overview

ChromCall is an R package for region-based chromatin enrichment analysis of epigenomic profiling data, including ChIP-seq, CUT&RUN, CUT&Tag, and ATAC-seq. It provides a transparent, statistically principled framework to quantify enrichment at predefined genomic regions (e.g. promoters or enhancers), enabling region-matched comparisons across samples and experiments without relying on data-dependent peak boundaries.


🚀 Key Features

  • Region-centric analysis
    Quantifies chromatin enrichment directly within predefined genomic windows, enabling consistent, region-matched comparisons across samples and experiments.

  • Transparent statistical framework
    Employs a Negative Binomial–based background model incorporating:

    • experiment-specific genome-wide background estimation
    • region-specific, control-derived modulation factors (when available)
  • Control-aware enrichment testing (optional)
    Supports integration of matched control experiments to account for local background variation. In the absence of a control, ChromCall relies on genome-wide background estimation.

  • Multiple complementary metrics
    For each region and experiment, ChromCall reports:

    • FDR-adjusted p-values
    • enrichment score (log₂ observed / expected)
    • z-score
    • significance-based region classification
  • Multi-experiment and multi-sample support
    Supports joint analysis of multiple chromatin marks and pairwise comparisons between samples within a unified framework.

  • Optional expression integration
    Gene-level expression values mapped to regions using TSS-based annotation can be incorporated to enable integrated chromatin–transcription analyses.


🧠 Statistical Model Overview

ChromCall models read counts using a Negative Binomial (NB) distribution, allowing for overdispersion beyond the Poisson assumption arising from technical variability in sequencing depth, library preparation, and local chromatin accessibility, thereby providing a more robust framework for sequencing-based count data.

Background Estimation

For each experiment, a genome-wide background rate ($\lambda_g$) is estimated as the mean read count per non-blacklisted genomic tile:

$$ \lambda_g = \frac{1}{N} \sum_{i=1}^{N} y_i $$

where ($y_i$) denotes the read count in the ith tile and N is the total number of non-blacklisted tiles.
Zero-count tiles are retained by default to avoid upward bias in sparse datasets and to ensure that ($\lambda_g$) reflects global background signal rather than local enrichment.

Control-based Local Modulation

To account for region-specific variability, ChromCall optionally derives a modulation factor from a matched control experiment:

$$ m_i = \max\left(1, \frac{y_i^{(\mathrm{ctrl})}}{\lambda_g^{(\mathrm{ctrl})}}\right) $$

The expected signal for region i in experiment j is then defined as:

$$ \lambda_{t,i}^{(j)} = m_i \times \lambda_g^{(j)} $$

In the absence of a control dataset, $m_i$ defaults to 1, and $\lambda_{t,i}^{(j)}$ is determined solely by genome-wide background estimation.

Statistical Testing and Effect Sizes

ChromCall evaluates region-level enrichment using a one-sided Negative Binomial test:

$$ p_i^{(j)} = P\left(Y \ge y_i^{(j)} \mid Y \sim \mathrm{NB}(\mu = \lambda_{t,i}^{(j)},\ \text{size} = \theta^{(j)})\right) $$

where $\theta^{(j)}$ is a global dispersion parameter estimated per experiment. The dispersion parameter is estimated from genome-wide background tiles, enabling stable estimation by borrowing information across regions in the absence of replicates.

When dispersion is negligible (i.e. $\theta \to \infty$), the model reduces to a Poisson distribution.

Multiple testing correction is applied across all regions using the Benjamini–Hochberg false discovery rate (FDR) procedure.

In addition to significance testing, ChromCall reports complementary effect-size metrics:

  • Enrichment score

$$ s_i^{(j)} = \log_2\left(\frac{y_i^{(j)} + \epsilon}{\lambda_{t,i}^{(j)} + \epsilon}\right) $$

  • z-score

$$ z_i^{(j)} = \frac{y_i^{(j)} - \lambda_{t,i}^{(j)}}{\sqrt{\lambda_{t,i}^{(j)} + \frac{(\lambda_{t,i}^{(j)})^2}{\theta^{(j)}}}} $$

Together, these metrics provide complementary measures of enrichment strength, effect size, and statistical confidence.


🧬 Implementation and Data Structures

ChromCall is implemented in R and builds upon the Bioconductor ecosystem, ensuring interoperability with standard genomic data structures and downstream analysis workflows:

  • GRanges for representing genomic intervals
  • SummarizedExperiment for storing structured assay outputs and metadata
  • GenomicAlignments for importing aligned sequencing reads from BAM files
  • GenomeInfoDb and Seqinfo for genome annotation and consistency checks

Each processed sample is returned as a SummarizedExperiment object containing raw region-level read counts together with experiment-level background parameters (bg_mean, bg_size). After statistical testing, additional assays including lambda_t, p_value, p_adj, score, and z_nb are appended.

Pairwise sample comparisons generate region-level Δ enrichment and Δ z-score metrics, enabling direct comparative analysis of chromatin states across biological conditions.


📦 Installation

ChromCall is available as a development version on GitHub and can be installed using remotes:

# install.packages("remotes")
remotes::install_github("GliomaGenomics/ChromCall")

🧪 Basic Workflow

Build a ChromCall sample

sample <- build_chromcall_sample(
  sample_name   = "sampleA",
  experiments   = list(
    H3K27me3 = "h3k27me3.bam",
    H3K4me3  = "h3k4me3.bam",
    Control  = "control.bam"
  ),
  control_name   = "Control",
  genome_file    = "genome.txt",
  region_file    = "promoters.bed",
  window_size    = 2000,
  blacklist_file = "blacklist.bed",
  expression_file = "expression_tss.bed"
)

Perform region-level enrichment testing

result <- test_region_counts(sample)

Compare two samples

comparison <- compare_samples(resultA, resultB, threshold = 0.25)

Export results

write_experiment_results(result, "H3K4me3", "results.tsv")
write_comparison_results(comparison, "comparison.tsv")

📈 Outputs

Metric Description
counts Raw read count per region
lambda_g Genome-wide background rate
lambda_t Locally adjusted expected signal
p_value, p_adj NB test p-values and FDR-adjusted values
score log₂(Observed / Expected) enrichment
z_nb NB-based z-score
DeltaEnrichment, DeltaZscore Pairwise comparison metrics

💡 Contact

For questions, issues, or feature requests, please open a 👉 GitHub issue

About

ChromCall is an R package for region-level analysis of chromatin profiling data, with integration of gene expression data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages