Skip to content

feat: Add network request retry for ncbi_lrg_refseqgene.py#84

Merged
jarbesfeld merged 4 commits into
mainfrom
issue-83
Jan 27, 2026
Merged

feat: Add network request retry for ncbi_lrg_refseqgene.py#84
jarbesfeld merged 4 commits into
mainfrom
issue-83

Conversation

@jarbesfeld

Copy link
Copy Markdown
Contributor

closes #83

Note: I had ChatGPT assist me here, but this seems to be a logical solution for the ncbi_lrg_refseqgene.py module. There are other harvesters that don't have the network retry logic, so I am wondering if we should add this in other places, or add a new module that can imported from utils that does the network retry

@jarbesfeld jarbesfeld self-assigned this Jan 27, 2026
@jarbesfeld jarbesfeld requested a review from a team as a code owner January 27, 2026 18:12
@jarbesfeld jarbesfeld added bug Something isn't working priority:high High priority labels Jan 27, 2026
@jarbesfeld jarbesfeld requested a review from zealws January 27, 2026 18:15

@zealws zealws left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had imagined you'd wrap these requests calls in a session to do the retries for you. That would be the more sophisticated approach to this. The documentation is probably a better resource than ChatGPT, fwiw.

This approach is crude, but it does work (except for the the exception issue I highlighted), however, I think it'd be better to expose a helper function that does the retries so you could switch to a session later if necessary, and not complicate the call site with the retry logic.

I'll defer to @jsstevenson as to which of those approaches are acceptable.

Comment thread src/wags_tails/ncbi_lrg_refseqgene.py Outdated
@zealws

zealws commented Jan 27, 2026

Copy link
Copy Markdown

To clarify exactly how to use the session most effectively:

Here's the example code from the docs:

s = Session()
retries = Retry(
    total=3,
    backoff_factor=0.1,
    status_forcelist=[502, 503, 504],
    allowed_methods={'POST'},
)
s.mount('https://', HTTPAdapter(max_retries=retries))

You would create a session early on in the process, set up the retries as pictured here, pass the session to any code that needs to make HTTP requests, and then call session.get(...) instead of requests.get(...).

I'm not sure how amenable the wags-tails codebase would be to passing a session around in a bunch of places, so it might be acceptable to use a helper function that just creates a session for a single request. YMMV

@jsstevenson

jsstevenson commented Jan 27, 2026

Copy link
Copy Markdown
Member

I'm not sure how amenable the wags-tails codebase would be to passing a session around in a bunch of places

In general, the only shared state is within the class dedicated to a specific resource, so you're usually talking about a max of 2 distinct requests (if resource isn't available locally: 1 request to get it, if it's available locally: 1 request to check if up-to-date, and possibly a 2nd to acquire if local is out of date). I don't think it would be that hard to manage state like a session for this group of requests, but that's not a huge savings since it's a max of 2 requests, and typically just the 1. To share a session across individual getters, we could update DataSource.__init__ to receive a session instance, and maybe create it if not given (and then the caller is responsible for creating and reusing a session for all of its resource getters). We could also just force a shared state with a global or a singleton class or something. I think that's a little trickier though.

You could extract this out into a helper function like request_get_with_retries

Yeah I think that's reasonable

@jarbesfeld

Copy link
Copy Markdown
Contributor Author

@zealws @jsstevenson Were you two thinking of a helper function like this (see most recent commit)? I'm not entirely sure what sources this would be used be in since the download command is different for each one?

Comment thread src/wags_tails/utils/downloads.py Outdated
Comment thread src/wags_tails/ncbi_lrg_refseqgene.py
zealws
zealws previously approved these changes Jan 27, 2026

@zealws zealws left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I don't have permission to greenlight changes in this repo though, so I think you'll also need a review from @jsstevenson or another reviewer.

Comment thread src/wags_tails/ncbi_lrg_refseqgene.py Outdated
@jsstevenson jsstevenson changed the title feat!: Add network request retry for ncbi_lrg_refseqgene.py feat: Add network request retry for ncbi_lrg_refseqgene.py Jan 27, 2026
@jarbesfeld jarbesfeld merged commit df60eeb into main Jan 27, 2026
18 checks passed
@jarbesfeld jarbesfeld deleted the issue-83 branch January 27, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority:high High priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retry network requests on failure

3 participants