Question about assigning 'unassignable' reads

Hi, I have a question about how reads are assigned as conflicting, unassignable etc.

I've been trying to use SNPsplit on noisy PacBio long reads to help with haplotype-resolved assembly (which I know is a little optimistic), and I'm losing a lot of reads. Here's the allele-tagging report:

```
Processed 7590179 read alignments in total
Reads were unaligned and hence skipped: 264205 (3.48%)
5789031 reads were unassignable (76.27%)
111956 reads were specific for genome 1 (1.48%)
69824 reads were specific for genome 2 (0.92%)
155723 reads did not contain one of the expected bases at known SNP positions (2.05%)
611385 contained conflicting allele-specific SNPs (8.05%)
```

I'm expecting around 5% of reads to be assigned to each genome, so it looks like I'm losing a lot due to reads either not containing an expected base at known SNP positions (presumably because of single-base or insertion sequencing errors), or reads containing conflicting SNPs (presumably mainly due to sequencing error deletions of one of the N-masked sites). 

Is there a way to tune these features so that e.g. a read can still be assigned to genome 1 if it contains a genome 1-specific SNP, but has something other than the genome 2 SNP at another N-masked position (like an obvious sequencing error)?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about assigning 'unassignable' reads #79

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about assigning 'unassignable' reads #79

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions