Skip to content

Question about assigning 'unassignable' reads #79

@RobertBaird

Description

@RobertBaird

Hi, I have a question about how reads are assigned as conflicting, unassignable etc.

I've been trying to use SNPsplit on noisy PacBio long reads to help with haplotype-resolved assembly (which I know is a little optimistic), and I'm losing a lot of reads. Here's the allele-tagging report:

Processed 7590179 read alignments in total
Reads were unaligned and hence skipped: 264205 (3.48%)
5789031 reads were unassignable (76.27%)
111956 reads were specific for genome 1 (1.48%)
69824 reads were specific for genome 2 (0.92%)
155723 reads did not contain one of the expected bases at known SNP positions (2.05%)
611385 contained conflicting allele-specific SNPs (8.05%)

I'm expecting around 5% of reads to be assigned to each genome, so it looks like I'm losing a lot due to reads either not containing an expected base at known SNP positions (presumably because of single-base or insertion sequencing errors), or reads containing conflicting SNPs (presumably mainly due to sequencing error deletions of one of the N-masked sites).

Is there a way to tune these features so that e.g. a read can still be assigned to genome 1 if it contains a genome 1-specific SNP, but has something other than the genome 2 SNP at another N-masked position (like an obvious sequencing error)?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions