Best practices to go from raw data to a clean seqdataset

Hi,

thanks for developing such an efficient and needed tool!

I have been looking around this and other repositories of ML4GLand to find examples of best practices to read a genome fasta and a bam or bed file to produce one hot encoded sequences and corresponding coverage arrays. However, in most cases I see reference to an already existing zarr object. Is such an example of dataset making already available?

I saw the API documentation reference and can guess how to do it, but I am unsure whether I would end up doing it in the most efficient way. I hope I did not miss something...

Thanks very much in advance, best,

Miquel



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices to go from raw data to a clean seqdataset #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Best practices to go from raw data to a clean seqdataset #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions