Dear Dr. Kiselev, Dr. Hemberg,
I have been encountering an issue when running the scmap-cluster pipeline and would like to ask you for some input on this matter. It seems that independent runs result in variable and inconsistent assignments. There seem to be two main sets of results, one of which seems fair based on prior knowledge about the composition of the query dataset. The second set of results comprises assignments that seem to reflect an imperfect match rather than a complete mess. While it might still be to some degree possible that the assignment that I think is problematic is actually correct, the issue remains of the variability in the mapping results. I should stress that the "wrong" mapping occurs much more frequently than the one I feel would be correct, if I run scmap from scratch repeatedly, suggesting it may be right after all. Still, the variability troubles me.
I have possibly narrowed down the steps resulting in variable outcomes to the reference normalization (see script below). I tried increasing the number of selected features thinking that this may mitigate the impact of the previous steps, but that didn´t seem to be the case.
sf2 <- 2^rnorm(ncol(sce2))
sf2 <- sf2/mean(sf2)
normcounts(sce2) <- t(t(counts(sce2))/sf2)
counts(sce2) <- normcounts(sce2)
logcounts(sce2) <- log2(normcounts(sce2)+1)
rowData(sce2)$feature_symbol<-rownames(sce2)
I would be happy to receive some suggestions from your side.
Dear Dr. Kiselev, Dr. Hemberg,
I have been encountering an issue when running the scmap-cluster pipeline and would like to ask you for some input on this matter. It seems that independent runs result in variable and inconsistent assignments. There seem to be two main sets of results, one of which seems fair based on prior knowledge about the composition of the query dataset. The second set of results comprises assignments that seem to reflect an imperfect match rather than a complete mess. While it might still be to some degree possible that the assignment that I think is problematic is actually correct, the issue remains of the variability in the mapping results. I should stress that the "wrong" mapping occurs much more frequently than the one I feel would be correct, if I run scmap from scratch repeatedly, suggesting it may be right after all. Still, the variability troubles me.
I have possibly narrowed down the steps resulting in variable outcomes to the reference normalization (see script below). I tried increasing the number of selected features thinking that this may mitigate the impact of the previous steps, but that didn´t seem to be the case.
sf2 <- 2^rnorm(ncol(sce2))
sf2 <- sf2/mean(sf2)
normcounts(sce2) <- t(t(counts(sce2))/sf2)
counts(sce2) <- normcounts(sce2)
logcounts(sce2) <- log2(normcounts(sce2)+1)
rowData(sce2)$feature_symbol<-rownames(sce2)
I would be happy to receive some suggestions from your side.