Skip to content

Population scale merging with Jasmine for short-reads #64

Description

@Solyris83

Hi,

We are intending to test out Jasmine for a population based experiment with more than 10k sample WGS with Illumina short-reads (SR).

We are doing a test-run on 3202 WGS result on AWS OpenAccess data Illumina-Dragen processed WGS 30x coverage.
https://registry.opendata.aws/ilmn-dragen-1kgp/

The Jasmine runs on 3202 sample sv.vcf from the Dragen analysis output did not complete and timed-out after 3 days on a 64 CPU and 128 GB RAM machine.

I tried to split_jasmine the VCF as mentioned below, but the jar script failed likely due to missing CHR2 INFO tag which is not inherently output by the SV-caller.
#18

We were thinking of either

  1. Running in smaller batches of maybe 500-1000 samples and merging each batches resulting VCF in a step-wise manner.
  2. split by chromosome level for the whole 3202 sample and then Jasmine merge these 3202 over 24 chromosome (1-22, X and Y)

Does route 2 affect the BND events which are found abundantly (~20%) in the Dragen VCF?
And please do comment which route is more preferable?

Regards
Solyris

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions