-
Notifications
You must be signed in to change notification settings - Fork 22
Running on Expression Matrices
Decomposition of expression signatures using signatureanalyzer. For identifying de novo signatures in expression matrices (ex. single-cell RNA-Seq, bulk RNA-Seq, etc.). The following document is a reference of important considerations when running this method for these data-types.
With most clustering methods on transcriptional data, we recommend a highly variable gene selection step. These are well documented for single-cell and bulk RNA-Seq data and may reduce the input feature space to a few thousand genes of interest.
For mutational signatures, we assume a gaussian distribution of normalized counts and use Fevotte & Tan's derivation of a gaussian objective function for ARD-NMF. Thus, it is important to use the gaussian objective function. We recommend the following normalization methods:
- Bulk RNA-seq:
log2(TPM+1); normalize TPMs with DESeq2 size factors - Single-cell RNA-seq:
ln(CP10K+1); normalize CP10K w/ scran size factors
Use:
--objective gaussian
This specifies whether or not to do Cosmic mapping for your dataset. For expression matrices, this is not relevent.
Use:
-t matrix
We generally impose an exponential (L1) or half-normal (L2) prior on the W & H matrices for non-negative matrix factorization.
Use:
--prior_on_H L1 --prior_on_W L1
# or
--prior_on_H L2 --prior_on_W L2
This method may be run using an input of (n samples x m features).
Use:
signatureanalyzer -n 10 \
-t matrix \
--objective gaussian \
--max_iter 30000 \
--prior_on_H L1 \
--prior_on_W L1 \
input.tsv