[XV] Add support for complex data structure#2937
Conversation
## Changes to cross-validation expression block schema
### New expression block structure
Expression blocks (`left_hand` / `right_hand`) now require one of two mutually exclusive sub-blocks. The legacy flat format (`operation` and `ordered_variables` at the top level) is no longer supported.
**Aggregation sub-block**
```json
{
"aggregation": {
"operation": "sum | difference | average | product | no_op",
"ordered_variables": ["alias_0", "alias_1"],
"apply_to": "individual | group"
},
"save_as": "result_alias"
}
```
**Array-of-dicts sub-block**
```json
{
"iterate_array_of_dicts": {
"variable_name": "alias_for_list_of_dicts",
"attribute_of_interest": "attribute_name",
"comparison_value": "alias_for_comparison_value",
"relationship": "equal | greater | ...",
"filter_array": true
},
"save_as": "result_alias"
}
```
`save_as` is optional and always lives at the outer expression-block level in both forms.
---
### `iterate_array_of_dicts` behaviour
| `filter_array` | Result |
|---|---|
| `true` | Returns the subset of entries where `entry[attribute_of_interest] <relationship> comparison_value` |
| `false` | Returns `[true]` if **all** entries satisfy the condition, `[false]` otherwise |
---
### Bug fixes in `data_validator.py`
- **`_evaluate_expression` always returned success** — `return result, True` was changed to `return result, evaluated`. Previously, a failed aggregation or array-iteration returned `(None, True)`, bypassing the failure check in `_evaluate_condition`.
- **Wrong function name in `info_map`** — Error log entries in `_evaluate_aggregation_block` referenced `_evaluate_expression.__name__` instead of `_evaluate_aggregation_block.__name__`.
- **`comparison_value` not list-wrapped in `_evaluate_iterate_array_of_dicts`** — All relation-mapping functions compare two lists. A scalar value fetched from the alias pool (e.g. `"CALF"`) was passed raw, making every comparison silently return `false`. Now wrapped: `if not isinstance(comparison_value, list): comparison_value = [comparison_value]`.
---
### Test updates (`tests/test_data_validator.py`)
- Removed `test_extract_aggregation_block` — `_extract_aggregation_block` was deleted
- Removed `test_evaluate_expression_nested_aggregation_format` — now redundant
- Updated `test_evaluate_expression_unknown_operation`, `test_evaluate_expression_no_ordered_variables`, `test_evaluate_expression_apply_to_individual`, `test_evaluate_expression_apply_to_group` — expression blocks wrapped in `{"aggregation": {...}}` with `save_as` moved to the outer level
- Added tests for `_evaluate_iterate_array_of_dicts`: filter mode, enforce mode, unknown relationship (eager and non-eager), and `save_as` propagation through `_evaluate_expression`
|
Commit 6767190 has the following changes: New feature:
|
filter_array |
Result |
|---|---|
true |
Returns the filtered subset of list[dict] entries where entry[attribute_of_interest] <relationship> comparison_value |
false |
Returns [true] if all entries satisfy the condition, [false] otherwise |
Changes to data_validator.py
_evaluate_expressionrefactored — now dispatches to_evaluate_aggregation_blockor_evaluate_iterate_array_of_dictsbased on which sub-block key is present; raisesValueErrorif neither is found_evaluate_aggregation_blockextracted — aggregation logic moved out of_evaluate_expressioninto its own method_evaluate_iterate_array_of_dictsadded — new method implementing the array-of-dicts filter/enforce feature
Bug fixes caught during review:
_evaluate_expressionwas hardcoded toreturn result, True, masking failures from sub-expressions — fixed toreturn result, evaluatedinfo_mapentries in_evaluate_aggregation_blockreferenced_evaluate_expression.__name__instead of_evaluate_aggregation_block.__name__comparison_valuefetched from the alias pool was passed raw to relation functions which expect a list — addedif not isinstance(comparison_value, list): comparison_value = [comparison_value]
Changes to example_cross_validation.json
- All existing rules updated to the new
aggregationsub-block format - Typo
attirbute_of_interest→attribute_of_interestfixed - Added a fourth example rule demonstrating
iterate_array_of_dictswithfilter_array: false
Changes to tests/test_data_validator.py
test_evaluate_expression_unknown_operation,test_evaluate_expression_no_ordered_variables,test_evaluate_expression_apply_to_individual,test_evaluate_expression_apply_to_group— all expression blocks updated to the new{"aggregation": {...}}format withsave_asat the outer level- Added
test_evaluate_iterate_array_of_dicts_success— parametrized over filter mode, enforce mode (all-pass and partial-fail) - Added
test_evaluate_iterate_array_of_dicts_unknown_relationship_no_eagerand_eager— error handling paths - Added
test_evaluate_expression_with_iterate_array_of_dicts_and_save_as— verifiessave_asis applied at the expression-block level when usingiterate_array_of_dicts
|
Current Coverage: 99% Mypy errors on new_xv_feature branch: 1191 |
|
🚨 Please update the changelog. This PR cannot be merged until |
|
Current Coverage: 99% Mypy errors on new_xv_feature branch: 1191 |
|
🚨 Please update the changelog. This PR cannot be merged until |
…w_ed [Animal][Reproduction] Refactor `Reproduction.execute_cow_ed_protocol()`
|
Current Coverage: 99% Mypy errors on new_xv_feature branch: 1191 |
|
🚨 Please update the changelog. This PR cannot be merged until |
|
Current Coverage: % Mypy errors on new_xv_feature branch: 1194 |
|
🚨 Please update the changelog. This PR cannot be merged until |
matthew7838
left a comment
There was a problem hiding this comment.
Seems like a good start, great work, Allister. This PR should be merged with a careful/detailed update to documentation in/out of the code. The example makes sense to me, but I'm not fully understanding how this will be used to incorporate the "reverse lookup"; an example could be helpful.
| { | ||
| "description": "Number of stalls in growing pens", | ||
| "target_and_save": { | ||
| "aliases": { |
There was a problem hiding this comment.
This is a good naming change.
| } | ||
| }, | ||
| "relationship": "equal" | ||
| "operator": "equal" |
There was a problem hiding this comment.
My two cents are that relationship is the more intuitive choice here.
| "for_each": { | ||
| "mode": "enforce", | ||
| "in": "crop_configurations", | ||
| "field": "optimal_temperature", | ||
| "compare_field": "minimum_temperature", | ||
| "operator": "greater_or_equal_to" |
There was a problem hiding this comment.
This naming differs from what's mentioned in the comment. Which one are we finalizing?
| "mode": "enforce", | ||
| "in": "crop_configurations", | ||
| "field": "optimal_temperature", | ||
| "compare_field": "minimum_temperature", |
There was a problem hiding this comment.
| "compare_field": "minimum_temperature", | |
| "field_to_compare": "minimum_temperature", |
| "in": "crop_configurations", | ||
| "field": "optimal_temperature", | ||
| "compare_field": "minimum_temperature", | ||
| "operator": "greater_or_equal_to" |
There was a problem hiding this comment.
| "operator": "greater_or_equal_to" | |
| "relation_to_check": "greater_or_equal_to" |
Or we change the operator to something like check_greater_or_equal_to.
There was a problem hiding this comment.
This seems like changes from other PRs
| expression_block["aggregation"], eager_termination, relationship | ||
| ) | ||
| else: | ||
| raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}") |
There was a problem hiding this comment.
| raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}") | |
| raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}. Supported blocks are "aggregation" and "for_each". ") |
| ] | ||
| "aggregation": { | ||
| "function": "no_op", | ||
| "mode": "aggregate", |
There was a problem hiding this comment.
What's the reason to switch from "operation"?
| operator: str = iter_block.get("operator", "") | ||
| mode: str = iter_block.get("mode", "enforce") | ||
|
|
||
| compare_fn = self.relation_mapping.get(operator) |
There was a problem hiding this comment.
| compare_fn = self.relation_mapping.get(operator) | |
| compare_function = self.relation_mapping.get(operator) |
| if "for_each" in expression_block: | ||
| result, evaluated = self._evaluate_iterate_array_of_dicts( | ||
| expression_block["for_each"], eager_termination, relationship | ||
| ) | ||
|
|
||
| Notes | ||
| ----- | ||
| Expression block: | ||
| >>> { | ||
| ... "operation": "sum | difference | average | product | no_op", # optional, defaults to "no_op" | ||
| ... "apply_to": "individual | group", # optional | ||
| ... "ordered_variables": ["alias_0", "alias_1"], | ||
| ... "save_as": "alias_2" # optional | ||
| ... } | ||
| elif "aggregation" in expression_block: | ||
| result, evaluated = self._evaluate_aggregation_block( | ||
| expression_block["aggregation"], eager_termination, relationship | ||
| ) |
There was a problem hiding this comment.
Since the arguments are very similar, I suggest we use function mapping here to improve clarity and extendability.
Add support for filtering/iterating list[dict]
New feature:
iterate_array_of_dictsexpression block + schema refactorNew expression block schema
Expression blocks (
left_hand/right_hand) now require one of two mutually exclusive named sub-blocks. The flat format (operation,ordered_variablesat the top level) is no longer supported.Aggregation sub-block
{ "aggregation": { "operation": "sum | difference | average | product | no_op", "ordered_variables": ["alias_0", "alias_1"], "apply_to": "individual | group" }, "save_as": "result_alias" }Array-of-dicts sub-block
{ "iterate_array_of_dicts": { "variable_name": "alias_for_list_of_dicts", "attribute_of_interest": "attribute_name", "comparison_value": "alias_for_comparison_value", "relationship": "equal | greater | ...", "filter_array": true }, "save_as": "result_alias" }save_asis optional and always lives at the outer expression-block level in both forms.iterate_array_of_dictsbehaviourfilter_arraytruelist[dict]entries whereentry[attribute_of_interest] <relationship> comparison_valuefalse[true]if all entries satisfy the condition,[false]otherwiseChanges to
data_validator.py_evaluate_expressionrefactored — now dispatches to_evaluate_aggregation_blockor_evaluate_iterate_array_of_dictsbased on which sub-block key is present; raisesValueErrorif neither is found_evaluate_aggregation_blockextracted — aggregation logic moved out of_evaluate_expressioninto its own method_evaluate_iterate_array_of_dictsadded — new method implementing the array-of-dicts filter/enforce featureBug fixes caught during review:
_evaluate_expressionwas hardcoded toreturn result, True, masking failures from sub-expressions — fixed toreturn result, evaluatedinfo_mapentries in_evaluate_aggregation_blockreferenced_evaluate_expression.__name__instead of_evaluate_aggregation_block.__name__comparison_valuefetched from the alias pool was passed raw to relation functions which expect a list — addedif not isinstance(comparison_value, list): comparison_value = [comparison_value]Changes to
example_cross_validation.jsonaggregationsub-block formatattirbute_of_interest→attribute_of_interestfixediterate_array_of_dictswithfilter_array: falseChanges to
tests/test_data_validator.pytest_evaluate_expression_unknown_operation,test_evaluate_expression_no_ordered_variables,test_evaluate_expression_apply_to_individual,test_evaluate_expression_apply_to_group— all expression blocks updated to the new{"aggregation": {...}}format withsave_asat the outer leveltest_evaluate_iterate_array_of_dicts_success— parametrized over filter mode, enforce mode (all-pass and partial-fail)test_evaluate_iterate_array_of_dicts_unknown_relationship_no_eagerand_eager— error handling pathstest_evaluate_expression_with_iterate_array_of_dicts_and_save_as— verifiessave_asis applied at the expression-block level when usingiterate_array_of_dictsContext
Issue(s) closed by this pull request: closes #
What
Why
How
Test plan
Input Changes
Output Changes
Filter