Add pipeline value operations tests#234
Add pipeline value operations tests#234edoyango wants to merge 12 commits intoACCESS-Community-Hub:developfrom
Conversation
Clip is more appropriate as there is an equivalent dask/numpy/xarray function. Additionally, no normalisation is occuring.
This aligns xarray FillNan with numpy FillNan
Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.
when creating a new variable with Derive, the new variable was being added to the input dataset. This commit fixes that by making a shallow copy of the input and adding the variables to that shallow copy before returning it.
Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.
Pull Request Test Coverage Report for Build 20835553598Details
💛 - Coveralls |
|
Thanks for the changes. Just had a minor comment (see above). Otherwise, it looks good. appreciate the comments around deep copy and what they affect. Given that lot of applications of the pipeline can be memory hungry |
|
|
||
|
|
||
| class ForceNormalised(DaskOperation): | ||
| class Clip(DaskOperation): |
There was a problem hiding this comment.
I'm fine with this change, and there are no issues with dependencies on it at the moment. However, the docs will need to be updated (e.g. docs/api/pipeline/pipeline_api.md) and the notebooks (I think maybe only docs/notebooks/pipeline/Operations.ipynb).
| def apply_func(self, sample: T) -> T: | ||
| return sample.fillna(self.nan) | ||
|
|
||
| if not (isinstance(sample, xr.DataArray) or isinstance(sample, xr.Dataset)): |
There was a problem hiding this comment.
I wonder if we should have a general xarray subclass of operation which handles xarray type checking rather than doing it down in apply_func... this is okay, but it might be better to put it up higher.
There was a problem hiding this comment.
I did have a similar review comment, but I didn't mention it because I wasn't familiar enough with the codebase to suggest where an ideal place would be, and how much impact putting it at a higher level would have on everything else.
I guess (in hindsight) those would be the considerations, and also if it needs to be in a separate issue. But yes, it does seem like a very common "entrypoint" check that applies to many things.
|
|
||
| def apply_func(self, sample: T) -> T: | ||
| return sample.fillna(self.nan) | ||
|
|
There was a problem hiding this comment.
Just reading the docstring above (it won't let me comment on unchanged lines), it says "If no value is passed then positive infinity values will be replaced with a very large number. Defaults to None.". But None is not a very large number. So by default that comment doesn't parse for me.

The tests covers all the operation classes in
operations/{dask,numpy,xarray}/values.py. Some important changes:NotImplementedError, and just needed to add a missing arg inda.nan_to_numto make it work.Replaceclass instead ofreplace_valueclass frompyearthtool.data.transforms.maskasreplace_valuedoesn't exist anymore.xarray.DerivewhereDerive.apply_funcwas also updating the input dataset.pyearthtools.data.transforms.deriveby replacing the deprecatedDataset.dropwithDataset.drop_vars