Cancer development and response to treatment are evolutionary processes, but characterizing evolutionary dynamics at a clinically meaningful scale has remained challenging. Here we develop a new methodology called EVOFLUx, based on natural DNA methylation barcodes fluctuating over time, that quantitatively infers evolutionary dynamics using only a bulk tumour methylation profile as input. We apply EVOFLUx to 1,976 well-characterized lymphoid cancer samples spanning a broad spectrum of diseases and show that initial tumour growth rate, malignancy age and epimutation rates vary by orders of magnitude across disease types. We measure that subclonal selection occurs only infrequently within bulk samples and detect occasional examples of multiple independent primary tumours. Clinically, we observe faster initial tumour growth in more aggressive disease subtypes, and that evolutionary histories are strong independent prognostic factors in two series of chronic lymphocytic leukaemia. Using EVOFLUx for phylogenetic analyses of aggressive Richter-transformed chronic lymphocytic leukaemia samples detected that the seed of the transformed clone existed decades before presentation. Orthogonal verification of EVOFLUx inferences is provided using additional genetic data, including long-read nanopore sequencing, and clinical variables. Collectively, we show how widely available, low-cost bulk DNA methylation data precisely measure cancer evolutionary dynamics, and provides new insights into cancer biology and clinical behaviour.

Peripheral blood mononuclear cells (PBMCs); whole blood (WB); precursor B- and T-acute lymphoblastic leukemias (B- and T-ALL, respectively); 149 mantle cell lymphomas (MCL); chronic lymphocytic leukemias (CLL); monoclonal B-cell lymphocytosis (MBL); diffuse large B-cell lymphoma (DLBCL); Richter transformation (RT); multiple myeloma (MM) and monoclonal gammopathy of undetermined significance (MGUS).
Here, we provide code to perform some analyses from our manuscript and the revision process:
Full code and explanations are available here.
Initial quality control (QC) and normalization of Illumina arrays (450k and EPIC). The resulting DNA methylation matrix is availalbe in Zenodo.
QC
There have been lots of epigenetic clocks published, but fCpG do not fit into any of these previously reported clocks. Instead, fCpG seem to represent a previously unrecognized cell-specific and netural lineage markers. We verify fCpGs as "evolving barcodes", which show independent on-going allele specific changes to methylation, uniquely labelling cell lineages.
fCpGs and epigenetic clocks
We validated the fCpG methylation dynamics in matched long-read nanopore and Illumina array data as well as in additional whole-genome bisulfite sequencing data.
Long-read nanopore methylation analyses
Long-read nanopore mehylation haplotype blocks
WGBS data analyses
We thouroughtly investigated the possibility that fCpG methylation could be influenced by genetics. Comparison of methylation SNPs vs fCpGs, databse annotations, a data-driven approach capturing possible cancer-specific methylation-genetic confounding, analyses on longitudinal samples, as well as long-read nanopore data discarded any significant genetic confounding on the methylation values of fCpGs. We also performed inferences with fCpG exclusively mapping to diploid genome, and we did not find a significant impact on the estimated evolutionary variables.
Analyses with Illumina control SNPs
Extensive SNP annotation and comparison with fCpGs as well as "Gap hunting" algorithm to detect cancer-specific genetic-methylation variation
Longitudinal fCpG methylation and genetic evolution
CNA plots
As the DNA methylome is influenced by age, we tested if fCpGs showed evidence of age-dependent epigenetic modulation. In normal blood samples, mean fCpG methylation was not correlated with age, suggesting fluctuations continue throughout life, whereas fCpG methylation variance increased with age. Variance is higher in samples where there has been a recent clonal expansion (i.e. homozygous methylated/unmethylated alleles become more prominent), suggesting fCpGs are detecting age-related clonal expansions of cells of the hematopoietic system.
fCpG dynamics in aging
RNAseq analysis demonstrated that genes associated with fCpGs have significantly lower expression levels, with no association between fCpG methylation status and associated gene expression in matched cases. In addition, there was no correlation between fCpG methylation and the expression of key DNA methylation modifier genes.
RNA-seq analyses.
We show how evolutionary variables derived from EVOFLUx have a strong clinical impact in 2 series of chronic lymphocytic leukemia (CLL), considering other well established biological and clinical parameters.
Clinical analyses in CLL
No new methylation bead array data was generated in the course of this study. The harmonised and filtered methylation matrix was deposited using Zenodo. Previously published DNA methylation data re-analysed in this study can be found under accession codes: B cells, EGAS00001001196; ALL, GSE56602, GSE49032, GSE76585, GSE69229; MCL, EGAS00001001637, EGAS00001004165; CLL, EGAD00010000871, EGAD00010000948, EGAD00010001975; MM, EGAS00001000841; DLBCL, EGAD00010001974. External DNA methylation data for sorted immune cells, GSE137594 and GSE184269. For whole-blood samples, GSE72773, GSE55763, GSE40279 and GSE36054. CLL gene expression data is available EGAS00001000374 and EGAS00001001306. ChIP-seq datasets are available from Blueprint https://www.blueprint-epigenome.eu/ under the accession EGAS00001000326. Matched WES and WGS are available under accessions EGAS00000000092 and EGAD00001008954 respectively.
