LaVOUS (Lineage-aware Variational Ornstein-Uhlenbeck Stochastics) is a lineage-aware variational model for single-cell RNA-seq count data. It models latent expression on a cell lineage tree with Brownian motion (BM) or Ornstein-Uhlenbeck (OU) dynamics, maps latent expression through a softplus link, and evaluates observed counts with a negative-binomial observation model.
The package implements three analysis workflows:
- Expression heritability: tests whether expression follows lineage structure using a likelihood-ratio test between a BM model with Pagel's lambda fixed at 0 and a BM model with free Pagel's lambda.
- Differential expression: tests whether expression shifts across regimes on the tree using an OU likelihood-ratio test between one shared theta and multiple regime-specific theta values. Empirical-null calibration can be run separately from the fitted null model.
- History reconstruction: reconstructs latent expression histories on the lineage tree using fitted OU/BM parameters and variational leaf beliefs.
The source package is in singlecellstochastics/. The project name is retained
for package compatibility; the model implemented here is LAVOUS.
Install locally from the repository root:
pip install -e .The package currently exposes both descriptive lavous-* commands and older
run-* command names for compatibility.
The main workflows require:
--tree: Newick lineage tree. Leaf names must match expression-matrix rows.--expression: raw read-count matrix with cells as rows and genes as columns. Do not log-transform counts.--regime: node-to-regime labels for OU workflows. Two formats are accepted:node_name,regimefor named tree nodes, ornode,node2,regimewhere a node is represented by the MRCA of one or two leaves.--null: regime label used as the null/background regime for OU tests.--library: optional per-cell library-size factor, ordered or named by cell.
Multiple trees, count matrices, regime files, and library files can be supplied as comma-separated lists. The code aligns cells to tree leaves during preprocessing.
lavous-heritability \
--tree examples/input_data/tree_demo.nwk \
--expression examples/input_data/readcounts_demo.tsv \
--outfile examples/output_results/heritability.tsvThis workflow fits BM/NB models under lambda=0 and free lambda and reports the likelihood-ratio statistic, p-value, Benjamini-Hochberg q-value, and fitted BM parameters.
lavous-diff \
--tree examples/input_data/tree_demo.nwk \
--expression examples/input_data/readcounts_demo.tsv \
--regime examples/input_data/regime_demo.csv \
--null 0 \
--outdir examples/output_results \
--prefix diffThe differential-expression workflow writes:
{prefix}_chi-squared.tsv: fitted parameters, losses, LR statistic, p-value, q-value, and significance indicator.{prefix}_model-params.tsv: long-form fitted OU parameters with one row per gene, hypothesis, and regime. This is the preferred parameter file for calibration diagnostics and reconstruction.{prefix}_meta.json: run metadata needed for empirical-null calibration.{prefix}_h0_q-mean-std_*.tsvand{prefix}_h1_q-mean-std_*.tsv: variational leaf means and standard deviations. Columns are namedq_mean_{cell}andq_std_{cell}.
The result table reports lrt = 2 * (h0_loss - h1_loss).
Chi-squared p-values are computed from lrt; empirical
calibration compares simulated and observed (h0_loss - h1_loss).
After running the differential-expression test, optionally calibrate p-values from null simulations:
lavous-calibrate \
--chi examples/output_results/diff_chi-squared.tsv \
--sim_all 1000Use --sim_each N for per-gene null simulations. This is much more expensive
because it refits the LRT to N simulated datasets per gene.
lavous-reconstruct \
--tree examples/input_data/tree_demo.nwk \
--q_params examples/output_results/diff_h1_q-mean-std_0.tsv \
--read_counts examples/input_data/readcounts_demo.tsv \
--gene Gene_2 \
--model ou \
--regime examples/input_data/regime_demo.csv \
--ou examples/output_results/diff_model-params.tsv \
--out_tsv examples/output_results/history_gene2.tsv \
--out_fig examples/output_results/history_gene2.pngThe expression input for reconstruction should contain leaf-level variational
beliefs, such as the wide q-parameter files written by
lavous-diff when --gene is supplied. Reconstruction normalizes tree branch
lengths by default to match the fitted OU/BM model scale; use
--no_normalize_tree only for parameters fitted on raw branch lengths.
In addition, to enerate a small simulated read-count matrix from a tree and regime file:
lavous-simulate \
--tree examples/input_data/tree_demo.nwk \
--regime examples/input_data/regime_demo.csv \
--test 1 \
--background 1 \
--n_genes 5 \
--sigma 3 \
--optim 3 \
--alpha 1 \
--dispersion 5 \
--out examples/input_data \
--label demoThis writes simulation examples/input_data/readcounts_demo.tsv.
preprocess.py: tree, count, library-size, and regime preprocessing.likelihood.py: Gaussian BM/OU tree likelihoods.approx.py: softplus/exp moment approximations used by the ELBO.elbo.py: variational objective for latent expression and count likelihoods.optimize.py: PyTorch and SciPy optimization routines.plasticity.py: expression-heritability LRT CLI.ou_diff.py: differential-expression LRT CLI.calibrate.py: empirical-null calibration CLI.reconstruct.py: Gaussian belief propagation for history reconstruction.stochas_sim.pyandsimulate.py: simulation utilities.
More detailed developer notes are in docs/source_map.md.
Run a syntax/import check from the repository root:
python -m compileall singlecellstochastics
python - <<'PY'
import singlecellstochastics
print(singlecellstochastics.__version__)
PY