Skip to content

Close netcdf dataset after getting its size#29

Open
ninsbl wants to merge 3 commits into
ioos:mainfrom
ninsbl:close_nc
Open

Close netcdf dataset after getting its size#29
ninsbl wants to merge 3 commits into
ioos:mainfrom
ninsbl:close_nc

Conversation

@ninsbl

@ninsbl ninsbl commented Oct 1, 2021

Copy link
Copy Markdown

Thanks for this cool python library. I looked at alternatives (like siphon) but this one still is the most to the point solution with the features / functions I need.
Hopefully, the fact that there have not been any commits to the master branch the last years is a sign of the reliability of the library and that it just works (and not that it is no longer actively maintained).

When I used it to crawl a larger Thredds server, I noticed that the server at some point returned a 502 Bad Gateway error. It may be related to the issue I try to address in this PR, that netcdf files are not closed after their size is computed, leaving the server with plenty of open datasets?

Another related question is, if I am not interested in the size of a dataset, but just want to get the URLs opening the dataset and computing it`s size is time spent unneccessary. Would it be acceptable for you to change the default, that the size is only computed on user request? I could have a look at that and make a separate PR...

But if this library is no longer maintained, I would be really happy if you could point me to an alternative library that could be used as a replacement (with the same features)...

@ninsbl

ninsbl commented Oct 3, 2021

Copy link
Copy Markdown
Author

So it seemed the main problem was actually the TCP connection that was kept alive (for some time) with a newer version of requests. That accumulates and led to thousands of open connections on a bigger thredds server, effectively killing the server.

This could probably be solved more elegant, like e.g. here:
https://stackoverflow.com/questions/54876452/run-parallel-request-session-in-python
but closing the connection after the data is read limits the number of open connections to the number of workers. Performance seems unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant