Tree inference from mitochondrial mutations

mt-SCITE infers trees from a matrix of mutation probabilities built from a statistical model of alternate read counts in mitochondrial sequencing reads.

Installation

clang++ src/*.cpp -o mtscite

Usage

mtscite -i <mut_probabilities> -n <n_sites> -m <n_cells> -l <n_iters> -seed <random_seed>

End-to-end analysis

We also provide notebooks and scripts to perform an end-to-end tree inference analysis using mt-SCITE.

Environment

conda create -n mtSCITE python=3.10
conda activate mtSCITE
conda install -c conda-forge biopython
conda install -c anaconda pandas
conda install -c anaconda scipy
conda install -c conda-forge matplotlib
conda install -c anaconda seaborn
conda install -c anaconda graphviz
pip install --global-option=build_ext --global-option="-I~/miniconda3/envs/mtSCITE/include"  --global-option="-L~/miniconda3/envs/mtSCITE/include/lib/" pygraphviz
conda install -c anaconda python-graphviz
conda install -c anaconda pydot
conda install -c anaconda networkx
conda install -c anaconda numpy=1.22

Workflow

Raw sequencing data was processed by the snakemake pipeline in preprocessing_pipeline/Snakefile. This generated tsv files specifying the number of reads supporting the four different nucleotides A, C, G, T and total number of reads for each position in the mitochondrial genome for each sample.

Computing mutation probabilities

compute_mutations_probabilities.ipynb was run to generate mutation probability matrices for a range of error rates. The mutation probability matrices were stored in /path/to/matrices/

Selecting the error rate

We used a 3-fold cross validation procedure to select the error rate that we used to infer the trees using mt-SCITE. After generating probability matrices for different error rates and storing them in /path/to/matrices/, you can perform cross-validation by running this command:

python scripts/cv.py </path/to/matrices/> </path/to/mtscite>

This will create a CSV file named val_scores.csv with the 3-fold cross-validation likelihood scores.

learn_error_rate.ipynb was run to analyze val_scores.csv and to select the error rate to be used for tree building.

Build trees

Run mt-SCITE with this command:

`</path/to/mtscite> -i pmat.csv -n <n_mutations> -m <n_samples> -r 1 -l 200000 -fd 0.0001 -ad 0.0001 -cc 0.0 -s -a -o </path/to/output/run_id>`

where pmat.csv is the mutation probability matrix generated with the learned error rate, n_mutations is the number of rows in pmat.csv and n_samples is the number of columns in pmat.csv.

Choosing the number of runs and chain length

We advise running mt-SCITE several times (in our experimental analyses, we used 10 repetitions). Each repetition starts mt-SCITE from a different starting point, or you can directly specify different starting seeds using -seed. Then, check that the best-scoring region is recovered by all chains. If the chains land in very different log-likelihood areas, increase the chain length (we used 10^6). Because the bottleneck is the number of mutations rather than cells, the search budget should be scaled with mutation count.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
Notebooks		Notebooks
preprocessing_pipeline		preprocessing_pipeline
scripts		scripts
src		src
tests		tests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tree inference from mitochondrial mutations

Installation

Usage

End-to-end analysis

Environment

Workflow

Computing mutation probabilities

Selecting the error rate

Build trees

Choosing the number of runs and chain length

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Tree inference from mitochondrial mutations

Installation

Usage

End-to-end analysis

Environment

Workflow

Computing mutation probabilities

Selecting the error rate

Build trees

Choosing the number of runs and chain length

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages