-
Notifications
You must be signed in to change notification settings - Fork 1
Description
If this could do from the CLI:
refget cache ABCXYZ // stores it into local disk cache
refget extract ABCXYZ --regions query.bed // look up sequences for a set of intervals
I need to retrieve sequences from a remote server.
Two options:
- I can use refgenie to get a fasta asset.
- I can create a new DRS endpoint for just the fasta file.
Option 1: Add DRS endpoint to the refget seqcol server
where is the pointer from Digest to Fasta file location stored? In the database... a new table? Files?
- add Files pydantic model
- add Files agent. given a digest, retrieve the DRSInfo, etc. basically make the files agent into a DRS provider
- add to the data_loaders, a data loader that can inset Files information.
- the endpoint is super easy, just use the files agent.
Option 2: Just use refgenie idea
Maybe let refgenie handle the distribution of the sequence data?
Differences
- refgenie asset is indexed by the asset digest, not the seqcol digest
- refgenie asset includes not just the fasta file but other stuff.
Decision
I should use refgenie.
- they'll be storing the fasta files anyway.
- refget benefits from the refgenie content delivery networks, automatically
- a good use case of refgenie
Implementation
How would I use this then? how do I pass information about the file from refgenie to the local refget extractor?
-
refget extract is happening in Python;
-
add an optional dependency of refgenie? Can it operate like a plugin?
-
refget cachelooks up the refgenie asset, pulls it out, and creates the refget-rs on-disk representation? so there's a $REFGETCACHE location where the seqcols are stored. -
refget extractthenis a lightweight python wrapper around the rust extraction command?