Hi, I have a question about how reads are assigned as conflicting, unassignable etc.
I've been trying to use SNPsplit on noisy PacBio long reads to help with haplotype-resolved assembly (which I know is a little optimistic), and I'm losing a lot of reads. Here's the allele-tagging report:
Processed 7590179 read alignments in total
Reads were unaligned and hence skipped: 264205 (3.48%)
5789031 reads were unassignable (76.27%)
111956 reads were specific for genome 1 (1.48%)
69824 reads were specific for genome 2 (0.92%)
155723 reads did not contain one of the expected bases at known SNP positions (2.05%)
611385 contained conflicting allele-specific SNPs (8.05%)
I'm expecting around 5% of reads to be assigned to each genome, so it looks like I'm losing a lot due to reads either not containing an expected base at known SNP positions (presumably because of single-base or insertion sequencing errors), or reads containing conflicting SNPs (presumably mainly due to sequencing error deletions of one of the N-masked sites).
Is there a way to tune these features so that e.g. a read can still be assigned to genome 1 if it contains a genome 1-specific SNP, but has something other than the genome 2 SNP at another N-masked position (like an obvious sequencing error)?
Thanks!
Hi, I have a question about how reads are assigned as conflicting, unassignable etc.
I've been trying to use SNPsplit on noisy PacBio long reads to help with haplotype-resolved assembly (which I know is a little optimistic), and I'm losing a lot of reads. Here's the allele-tagging report:
I'm expecting around 5% of reads to be assigned to each genome, so it looks like I'm losing a lot due to reads either not containing an expected base at known SNP positions (presumably because of single-base or insertion sequencing errors), or reads containing conflicting SNPs (presumably mainly due to sequencing error deletions of one of the N-masked sites).
Is there a way to tune these features so that e.g. a read can still be assigned to genome 1 if it contains a genome 1-specific SNP, but has something other than the genome 2 SNP at another N-masked position (like an obvious sequencing error)?
Thanks!