Hi,
We are intending to test out Jasmine for a population based experiment with more than 10k sample WGS with Illumina short-reads (SR).
We are doing a test-run on 3202 WGS result on AWS OpenAccess data Illumina-Dragen processed WGS 30x coverage.
https://registry.opendata.aws/ilmn-dragen-1kgp/
The Jasmine runs on 3202 sample sv.vcf from the Dragen analysis output did not complete and timed-out after 3 days on a 64 CPU and 128 GB RAM machine.
I tried to split_jasmine the VCF as mentioned below, but the jar script failed likely due to missing CHR2 INFO tag which is not inherently output by the SV-caller.
#18
We were thinking of either
- Running in smaller batches of maybe 500-1000 samples and merging each batches resulting VCF in a step-wise manner.
- split by chromosome level for the whole 3202 sample and then Jasmine merge these 3202 over 24 chromosome (1-22, X and Y)
Does route 2 affect the BND events which are found abundantly (~20%) in the Dragen VCF?
And please do comment which route is more preferable?
Regards
Solyris
Hi,
We are intending to test out Jasmine for a population based experiment with more than 10k sample WGS with Illumina short-reads (SR).
We are doing a test-run on 3202 WGS result on AWS OpenAccess data Illumina-Dragen processed WGS 30x coverage.
https://registry.opendata.aws/ilmn-dragen-1kgp/
The Jasmine runs on 3202 sample sv.vcf from the Dragen analysis output did not complete and timed-out after 3 days on a 64 CPU and 128 GB RAM machine.
I tried to split_jasmine the VCF as mentioned below, but the jar script failed likely due to missing CHR2 INFO tag which is not inherently output by the SV-caller.
#18
We were thinking of either
Does route 2 affect the BND events which are found abundantly (~20%) in the Dragen VCF?
And please do comment which route is more preferable?
Regards
Solyris