Skip to content

Best practices to go from raw data to a clean seqdataset #10

@MiqG

Description

@MiqG

Hi,

thanks for developing such an efficient and needed tool!

I have been looking around this and other repositories of ML4GLand to find examples of best practices to read a genome fasta and a bam or bed file to produce one hot encoded sequences and corresponding coverage arrays. However, in most cases I see reference to an already existing zarr object. Is such an example of dataset making already available?

I saw the API documentation reference and can guess how to do it, but I am unsure whether I would end up doing it in the most efficient way. I hope I did not miss something...

Thanks very much in advance, best,

Miquel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions