Population scale merging with Jasmine for short-reads

Hi,

We are intending to test out Jasmine for a population based experiment with more than 10k sample WGS with Illumina short-reads (SR). 

We are doing a test-run on 3202 WGS result on AWS OpenAccess data Illumina-Dragen processed WGS 30x coverage.
https://registry.opendata.aws/ilmn-dragen-1kgp/

The Jasmine runs on 3202 sample sv.vcf from the Dragen analysis output did not complete and timed-out after 3 days on a 64 CPU and 128 GB RAM machine. 

I tried to split_jasmine the VCF as mentioned below, but the jar script failed likely due to missing CHR2 INFO tag which is not inherently output by the SV-caller.
https://github.com/mkirsche/Jasmine/issues/18 

We were thinking of either
1) Running in smaller batches of maybe 500-1000 samples and merging each batches resulting VCF in a step-wise manner.
2) split by chromosome level for the whole 3202 sample and then Jasmine merge these 3202 over 24 chromosome (1-22, X and Y)

Does route 2 affect the BND events which are found abundantly (~20%) in the Dragen VCF?
And please do comment which route is more preferable?

Regards
Solyris 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Population scale merging with Jasmine for short-reads #64

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Population scale merging with Jasmine for short-reads #64

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions