Skip to content

Enable retrieving fasta file from a seqcol server DRS endpoint #31

@nsheff

Description

@nsheff

If this could do from the CLI:

refget cache ABCXYZ // stores it into local disk cache
refget extract ABCXYZ --regions query.bed  // look up sequences for a set of intervals

I need to retrieve sequences from a remote server.
Two options:

  1. I can use refgenie to get a fasta asset.
  2. I can create a new DRS endpoint for just the fasta file.

Option 1: Add DRS endpoint to the refget seqcol server

where is the pointer from Digest to Fasta file location stored? In the database... a new table? Files?

  • add Files pydantic model
  • add Files agent. given a digest, retrieve the DRSInfo, etc. basically make the files agent into a DRS provider
  • add to the data_loaders, a data loader that can inset Files information.
  • the endpoint is super easy, just use the files agent.

Option 2: Just use refgenie idea

Maybe let refgenie handle the distribution of the sequence data?

Differences

  • refgenie asset is indexed by the asset digest, not the seqcol digest
  • refgenie asset includes not just the fasta file but other stuff.

Decision

I should use refgenie.

  • they'll be storing the fasta files anyway.
  • refget benefits from the refgenie content delivery networks, automatically
  • a good use case of refgenie

Implementation

How would I use this then? how do I pass information about the file from refgenie to the local refget extractor?

  • refget extract is happening in Python;

  • add an optional dependency of refgenie? Can it operate like a plugin?

  • refget cache looks up the refgenie asset, pulls it out, and creates the refget-rs on-disk representation? so there's a $REFGETCACHE location where the seqcols are stored.

  • refget extract thenis a lightweight python wrapper around the rust extraction command?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions