Fix getting filenames out of netCDF datasets#8
Conversation
| if hasattr(bound_method, "__self__"): | ||
| file_handles.append(Path(bound_method.__self__._filename)) | ||
|
|
||
| return file_handles |
There was a problem hiding this comment.
overall structure looks good and I can follow what the function does, though I'm not familiar with the specific functions/classes/attributes used (can infer what they do from use, but not confirm that they do what they seem to)
| files.append(versioneer_file) | ||
| present = False | ||
| try: | ||
| with open(".gitattributes", "r") as fobj: |
There was a problem hiding this comment.
I am curious why the removal of "r" is beneficial. Wouldn't it be more futureproof to specify read only?
There was a problem hiding this comment.
This is just a versioneer artefact, not sure what's caused this to change tbh.
| engine="netcdf4", | ||
| ) | ||
| else: | ||
| ds = xr.open_dataset(fpath, decode_timedelta=False, engine="netcdf4") |
There was a problem hiding this comment.
open_mfdataset seems to cope with a single path (as best I can tell, it uses it as a glob string that returns a single file); is there a reason to use open_dataset explicitly for only one filepath?
There was a problem hiding this comment.
Good call, fixed
There was a problem hiding this comment.
nope. Working on updating the related examples now anyway so will fix in there
| fhandles = _get_file_handles(ds) | ||
|
|
||
| if isinstance(fpath, list): | ||
| assert fhandles == [Path(f) for f in fpath] |
There was a problem hiding this comment.
would this work if fpath contains the same paths but in a different order to fhandles? Would you want this to pass or fail in that instance
There was a problem hiding this comment.
in this case it'd fail, you could replace it with a set intersection to make it pass. Whether you'd want it to pass/fail is more of a philosophical question I guess
| else: | ||
| ds = xr.open_dataset(fpath, decode_timedelta=False, engine="netcdf4") | ||
| with warnings.catch_warnings(): | ||
| chunk_dict = validate_chunkspec( |
There was a problem hiding this comment.
not sure if this is the right place to mention this, but does validate_chunkspec care if the chunks are not an integer divisor of file size in that dimension? Ie, if files have 3 timesteps, in disc chunks of 1, is it okay with chunks of two? in which case you'd end up with every second chunk being half as big as the other. Are we okay with that?
There was a problem hiding this comment.
Good question. I actually don't know whether this would cause performance issues. I suspect not but can't say for certain..
Mostly closes #7
As of right now I'm not sure that we can get all the filenames out of a multifile dataarray, only the first. With that said, probably unimportant in nearly all situations. I've added an xfailing test for it, but in practice I think it'll give the right answer if a multifile datarray is passed to this function.
@jemmajeffree think this might be up your alley - no rush with the review though.