ChromCall

Assigning chromatin status to predefined genomic regions from epigenomic profiling data

🔍 Overview

ChromCall is an R package for region-based chromatin enrichment analysis of epigenomic profiling data, including ChIP-seq, CUT&RUN, CUT&Tag, and ATAC-seq. It provides a transparent, statistically principled framework to quantify enrichment at predefined genomic regions (e.g. promoters or enhancers), enabling region-matched comparisons across samples and experiments without relying on data-dependent peak boundaries.

🚀 Key Features

Region-centric analysis
Quantifies chromatin enrichment directly within predefined genomic windows, enabling consistent, region-matched comparisons across samples and experiments.
Transparent statistical framework
Employs a Negative Binomial–based background model incorporating:
- experiment-specific genome-wide background estimation
- region-specific, control-derived modulation factors (when available)
Control-aware enrichment testing (optional)
Supports integration of matched control experiments to account for local background variation. In the absence of a control, ChromCall relies on genome-wide background estimation.
Multiple complementary metrics
For each region and experiment, ChromCall reports:
- FDR-adjusted p-values
- enrichment score (log₂ observed / expected)
- z-score
- significance-based region classification
Multi-experiment and multi-sample support
Supports joint analysis of multiple chromatin marks and pairwise comparisons between samples within a unified framework.
Optional expression integration
Gene-level expression values mapped to regions using TSS-based annotation can be incorporated to enable integrated chromatin–transcription analyses.

🧠 Statistical Model Overview

ChromCall models read counts using a Negative Binomial (NB) distribution, allowing for overdispersion beyond the Poisson assumption arising from technical variability in sequencing depth, library preparation, and local chromatin accessibility, thereby providing a more robust framework for sequencing-based count data.

Background Estimation

For each experiment, a genome-wide background rate ($\lambda_g$) is estimated as the mean read count per non-blacklisted genomic tile:

$$ \lambda_g = \frac{1}{N} \sum_{i=1}^{N} y_i $$

where ($y_i$) denotes the read count in the ith tile and N is the total number of non-blacklisted tiles.
Zero-count tiles are retained by default to avoid upward bias in sparse datasets and to ensure that ($\lambda_g$) reflects global background signal rather than local enrichment.

Control-based Local Modulation

To account for region-specific variability, ChromCall optionally derives a modulation factor from a matched control experiment:

$$ m_i = \max\left(1, \frac{y_i^{(\mathrm{ctrl})}}{\lambda_g^{(\mathrm{ctrl})}}\right) $$

The expected signal for region i in experiment j is then defined as:

$$ \lambda_{t,i}^{(j)} = m_i \times \lambda_g^{(j)} $$

In the absence of a control dataset, $m_i$ defaults to 1, and $\lambda_{t,i}^{(j)}$ is determined solely by genome-wide background estimation.

Statistical Testing and Effect Sizes

ChromCall evaluates region-level enrichment using a one-sided Negative Binomial test:

$$ p_i^{(j)} = P\left(Y \ge y_i^{(j)} \mid Y \sim \mathrm{NB}(\mu = \lambda_{t,i}^{(j)},\ \text{size} = \theta^{(j)})\right) $$

where $\theta^{(j)}$ is a global dispersion parameter estimated per experiment. The dispersion parameter is estimated from genome-wide background tiles, enabling stable estimation by borrowing information across regions in the absence of replicates.

When dispersion is negligible (i.e. $\theta \to \infty$), the model reduces to a Poisson distribution.

Multiple testing correction is applied across all regions using the Benjamini–Hochberg false discovery rate (FDR) procedure.

In addition to significance testing, ChromCall reports complementary effect-size metrics:

Enrichment score

$$ s_i^{(j)} = \log_2\left(\frac{y_i^{(j)} + \epsilon}{\lambda_{t,i}^{(j)} + \epsilon}\right) $$

z-score

$$ z_i^{(j)} = \frac{y_i^{(j)} - \lambda_{t,i}^{(j)}}{\sqrt{\lambda_{t,i}^{(j)} + \frac{(\lambda_{t,i}^{(j)})^2}{\theta^{(j)}}}} $$

Together, these metrics provide complementary measures of enrichment strength, effect size, and statistical confidence.

🧬 Implementation and Data Structures

ChromCall is implemented in R and builds upon the Bioconductor ecosystem, ensuring interoperability with standard genomic data structures and downstream analysis workflows:

GRanges for representing genomic intervals
SummarizedExperiment for storing structured assay outputs and metadata
GenomicAlignments for importing aligned sequencing reads from BAM files
GenomeInfoDb and Seqinfo for genome annotation and consistency checks

Each processed sample is returned as a SummarizedExperiment object containing raw region-level read counts together with experiment-level background parameters (bg_mean, bg_size). After statistical testing, additional assays including lambda_t, p_value, p_adj, score, and z_nb are appended.

Pairwise sample comparisons generate region-level Δ enrichment and Δ z-score metrics, enabling direct comparative analysis of chromatin states across biological conditions.

📦 Installation

ChromCall is available as a development version on GitHub and can be installed using remotes:

# install.packages("remotes")
remotes::install_github("GliomaGenomics/ChromCall")

🧪 Basic Workflow

Build a ChromCall sample

sample <- build_chromcall_sample(
  sample_name   = "sampleA",
  experiments   = list(
    H3K27me3 = "h3k27me3.bam",
    H3K4me3  = "h3k4me3.bam",
    Control  = "control.bam"
  ),
  control_name   = "Control",
  genome_file    = "genome.txt",
  region_file    = "promoters.bed",
  window_size    = 2000,
  blacklist_file = "blacklist.bed",
  expression_file = "expression_tss.bed"
)

Perform region-level enrichment testing

result <- test_region_counts(sample)

Compare two samples

comparison <- compare_samples(resultA, resultB, threshold = 0.25)

Export results

write_experiment_results(result, "H3K4me3", "results.tsv")
write_comparison_results(comparison, "comparison.tsv")

📈 Outputs

Metric	Description
`counts`	Raw read count per region
`lambda_g`	Genome-wide background rate
`lambda_t`	Locally adjusted expected signal
`p_value`, `p_adj`	NB test p-values and FDR-adjusted values
`score`	log₂(Observed / Expected) enrichment
`z_nb`	NB-based z-score
`DeltaEnrichment`, `DeltaZscore`	Pairwise comparison metrics

💡 Contact

For questions, issues, or feature requests, please open a 👉 GitHub issue

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
R		R
inst/extdata		inst/extdata
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
chromcall.Rproj		chromcall.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChromCall

🔍 Overview

🚀 Key Features

🧠 Statistical Model Overview

Background Estimation

Control-based Local Modulation

Statistical Testing and Effect Sizes

🧬 Implementation and Data Structures

📦 Installation

🧪 Basic Workflow

Build a ChromCall sample

Perform region-level enrichment testing

Compare two samples

Export results

📈 Outputs

💡 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChromCall

🔍 Overview

🚀 Key Features

🧠 Statistical Model Overview

Background Estimation

Control-based Local Modulation

Statistical Testing and Effect Sizes

🧬 Implementation and Data Structures

📦 Installation

🧪 Basic Workflow

Build a ChromCall sample

Perform region-level enrichment testing

Compare two samples

Export results

📈 Outputs

💡 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages