Large files and method of parsing BAM

BAM files based on real experimental data has shown that large files are very inefficiently parsed using the method currently employed.

As of now, sequences are attended to in a specific order, making determining the offset within the contact matrix trivially easy (summed per outer loop). Unfortunately, the pysam method `fetch(seq_name)` appears to require much up-front IO that scales with BAM file size, resulting in a significant delay for each invocation. In the case of many reference sequences (possibly due to fragmented WGS assembly) this will become a huge penalty.

Therefore, we will require that a method be implemented which determines contact matrix offset from the predetermined sequence order and the fields available in the BAM file. This should not pose a problem.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large files and method of parsing BAM #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Large files and method of parsing BAM #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions