Skip to content

[XV] Add support for complex data structure#2937

Draft
allisterakun wants to merge 26 commits intodevfrom
new_xv_feature
Draft

[XV] Add support for complex data structure#2937
allisterakun wants to merge 26 commits intodevfrom
new_xv_feature

Conversation

@allisterakun
Copy link
Copy Markdown
Collaborator

@allisterakun allisterakun commented Apr 7, 2026

Add support for filtering/iterating list[dict]

New feature: iterate_array_of_dicts expression block + schema refactor

New expression block schema

Expression blocks (left_hand / right_hand) now require one of two mutually exclusive named sub-blocks. The flat format (operation, ordered_variables at the top level) is no longer supported.

Aggregation sub-block

{
  "aggregation": {
    "operation": "sum | difference | average | product | no_op",
    "ordered_variables": ["alias_0", "alias_1"],
    "apply_to": "individual | group"
  },
  "save_as": "result_alias"
}

Array-of-dicts sub-block

{
  "iterate_array_of_dicts": {
    "variable_name": "alias_for_list_of_dicts",
    "attribute_of_interest": "attribute_name",
    "comparison_value": "alias_for_comparison_value",
    "relationship": "equal | greater | ...",
    "filter_array": true
  },
  "save_as": "result_alias"
}

save_as is optional and always lives at the outer expression-block level in both forms.


iterate_array_of_dicts behaviour

filter_array Result
true Returns the filtered subset of list[dict] entries where entry[attribute_of_interest] <relationship> comparison_value
false Returns [true] if all entries satisfy the condition, [false] otherwise

Changes to data_validator.py

  • _evaluate_expression refactored — now dispatches to _evaluate_aggregation_block or _evaluate_iterate_array_of_dicts based on which sub-block key is present; raises ValueError if neither is found
  • _evaluate_aggregation_block extracted — aggregation logic moved out of _evaluate_expression into its own method
  • _evaluate_iterate_array_of_dicts added — new method implementing the array-of-dicts filter/enforce feature

Bug fixes caught during review:

  • _evaluate_expression was hardcoded to return result, True, masking failures from sub-expressions — fixed to return result, evaluated
  • info_map entries in _evaluate_aggregation_block referenced _evaluate_expression.__name__ instead of _evaluate_aggregation_block.__name__
  • comparison_value fetched from the alias pool was passed raw to relation functions which expect a list — added if not isinstance(comparison_value, list): comparison_value = [comparison_value]

Changes to example_cross_validation.json

  • All existing rules updated to the new aggregation sub-block format
  • Typo attirbute_of_interestattribute_of_interest fixed
  • Added a fourth example rule demonstrating iterate_array_of_dicts with filter_array: false

Changes to tests/test_data_validator.py

  • test_evaluate_expression_unknown_operation, test_evaluate_expression_no_ordered_variables, test_evaluate_expression_apply_to_individual, test_evaluate_expression_apply_to_group — all expression blocks updated to the new {"aggregation": {...}} format with save_as at the outer level
  • Added test_evaluate_iterate_array_of_dicts_success — parametrized over filter mode, enforce mode (all-pass and partial-fail)
  • Added test_evaluate_iterate_array_of_dicts_unknown_relationship_no_eager and _eager — error handling paths
  • Added test_evaluate_expression_with_iterate_array_of_dicts_and_save_as — verifies save_as is applied at the expression-block level when using iterate_array_of_dicts

Context

Issue(s) closed by this pull request: closes #

What

Why

How

Test plan

Input Changes

Output Changes

  • N/A

Filter

allisterakun and others added 12 commits April 6, 2026 20:01
## Changes to cross-validation expression block schema

### New expression block structure

Expression blocks (`left_hand` / `right_hand`) now require one of two mutually exclusive sub-blocks. The legacy flat format (`operation` and `ordered_variables` at the top level) is no longer supported.

**Aggregation sub-block**
```json
{
  "aggregation": {
    "operation": "sum | difference | average | product | no_op",
    "ordered_variables": ["alias_0", "alias_1"],
    "apply_to": "individual | group"
  },
  "save_as": "result_alias"
}
```

**Array-of-dicts sub-block**
```json
{
  "iterate_array_of_dicts": {
    "variable_name": "alias_for_list_of_dicts",
    "attribute_of_interest": "attribute_name",
    "comparison_value": "alias_for_comparison_value",
    "relationship": "equal | greater | ...",
    "filter_array": true
  },
  "save_as": "result_alias"
}
```

`save_as` is optional and always lives at the outer expression-block level in both forms.

---

### `iterate_array_of_dicts` behaviour

| `filter_array` | Result |
|---|---|
| `true` | Returns the subset of entries where `entry[attribute_of_interest] <relationship> comparison_value` |
| `false` | Returns `[true]` if **all** entries satisfy the condition, `[false]` otherwise |

---

### Bug fixes in `data_validator.py`

- **`_evaluate_expression` always returned success** — `return result, True` was changed to `return result, evaluated`. Previously, a failed aggregation or array-iteration returned `(None, True)`, bypassing the failure check in `_evaluate_condition`.
- **Wrong function name in `info_map`** — Error log entries in `_evaluate_aggregation_block` referenced `_evaluate_expression.__name__` instead of `_evaluate_aggregation_block.__name__`.
- **`comparison_value` not list-wrapped in `_evaluate_iterate_array_of_dicts`** — All relation-mapping functions compare two lists. A scalar value fetched from the alias pool (e.g. `"CALF"`) was passed raw, making every comparison silently return `false`. Now wrapped: `if not isinstance(comparison_value, list): comparison_value = [comparison_value]`.

---

### Test updates (`tests/test_data_validator.py`)

- Removed `test_extract_aggregation_block` — `_extract_aggregation_block` was deleted
- Removed `test_evaluate_expression_nested_aggregation_format` — now redundant
- Updated `test_evaluate_expression_unknown_operation`, `test_evaluate_expression_no_ordered_variables`, `test_evaluate_expression_apply_to_individual`, `test_evaluate_expression_apply_to_group` — expression blocks wrapped in `{"aggregation": {...}}` with `save_as` moved to the outer level
- Added tests for `_evaluate_iterate_array_of_dicts`: filter mode, enforce mode, unknown relationship (eager and non-eager), and `save_as` propagation through `_evaluate_expression`
@allisterakun
Copy link
Copy Markdown
Collaborator Author

allisterakun commented Apr 7, 2026

Commit 6767190 has the following changes:

New feature: iterate_array_of_dicts expression block + schema refactor

New expression block schema

Expression blocks (left_hand / right_hand) now require one of two mutually exclusive named sub-blocks. The flat format (operation, ordered_variables at the top level) is no longer supported.

Aggregation sub-block

{
  "aggregation": {
    "operation": "sum | difference | average | product | no_op",
    "ordered_variables": ["alias_0", "alias_1"],
    "apply_to": "individual | group"
  },
  "save_as": "result_alias"
}

Array-of-dicts sub-block

{
  "iterate_array_of_dicts": {
    "variable_name": "alias_for_list_of_dicts",
    "attribute_of_interest": "attribute_name",
    "comparison_value": "alias_for_comparison_value",
    "relationship": "equal | greater | ...",
    "filter_array": true
  },
  "save_as": "result_alias"
}

save_as is optional and always lives at the outer expression-block level in both forms.


iterate_array_of_dicts behaviour

filter_array Result
true Returns the filtered subset of list[dict] entries where entry[attribute_of_interest] <relationship> comparison_value
false Returns [true] if all entries satisfy the condition, [false] otherwise

Changes to data_validator.py

  • _evaluate_expression refactored — now dispatches to _evaluate_aggregation_block or _evaluate_iterate_array_of_dicts based on which sub-block key is present; raises ValueError if neither is found
  • _evaluate_aggregation_block extracted — aggregation logic moved out of _evaluate_expression into its own method
  • _evaluate_iterate_array_of_dicts added — new method implementing the array-of-dicts filter/enforce feature

Bug fixes caught during review:

  • _evaluate_expression was hardcoded to return result, True, masking failures from sub-expressions — fixed to return result, evaluated
  • info_map entries in _evaluate_aggregation_block referenced _evaluate_expression.__name__ instead of _evaluate_aggregation_block.__name__
  • comparison_value fetched from the alias pool was passed raw to relation functions which expect a list — added if not isinstance(comparison_value, list): comparison_value = [comparison_value]

Changes to example_cross_validation.json

  • All existing rules updated to the new aggregation sub-block format
  • Typo attirbute_of_interestattribute_of_interest fixed
  • Added a fourth example rule demonstrating iterate_array_of_dicts with filter_array: false

Changes to tests/test_data_validator.py

  • test_evaluate_expression_unknown_operation, test_evaluate_expression_no_ordered_variables, test_evaluate_expression_apply_to_individual, test_evaluate_expression_apply_to_group — all expression blocks updated to the new {"aggregation": {...}} format with save_as at the outer level
  • Added test_evaluate_iterate_array_of_dicts_success — parametrized over filter mode, enforce mode (all-pass and partial-fail)
  • Added test_evaluate_iterate_array_of_dicts_unknown_relationship_no_eager and _eager — error handling paths
  • Added test_evaluate_expression_with_iterate_array_of_dicts_and_save_as — verifies save_as is applied at the expression-block level when using iterate_array_of_dicts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Current Coverage: 99%

Mypy errors on new_xv_feature branch: 1191
Mypy errors on dev branch: 1191
No difference in error counts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Current Coverage: 99%

Mypy errors on new_xv_feature branch: 1191
Mypy errors on dev branch: 1191
No difference in error counts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.

@github-actions
Copy link
Copy Markdown
Contributor

Current Coverage: 99%

Mypy errors on new_xv_feature branch: 1191
Mypy errors on dev branch: 1191
No difference in error counts

@github-actions
Copy link
Copy Markdown
Contributor

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.

@github-actions
Copy link
Copy Markdown
Contributor

Current Coverage: %

Mypy errors on new_xv_feature branch: 1194
Mypy errors on dev branch: 1191
3 more errors on new_xv_feature branch

@github-actions
Copy link
Copy Markdown
Contributor

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.
🚨 Flake8 linting errors were found. Please fix the linting issues.
🚨 Some tests have failed.

@allisterakun allisterakun requested review from ew3361zh and matthew7838 and removed request for ew3361zh April 14, 2026 13:30
Copy link
Copy Markdown
Collaborator

@matthew7838 matthew7838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good start, great work, Allister. This PR should be merged with a careful/detailed update to documentation in/out of the code. The example makes sense to me, but I'm not fully understanding how this will be used to incorporate the "reverse lookup"; an example could be helpful.

{
"description": "Number of stalls in growing pens",
"target_and_save": {
"aliases": {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good naming change.

}
},
"relationship": "equal"
"operator": "equal"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents are that relationship is the more intuitive choice here.

Comment on lines +205 to +210
"for_each": {
"mode": "enforce",
"in": "crop_configurations",
"field": "optimal_temperature",
"compare_field": "minimum_temperature",
"operator": "greater_or_equal_to"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming differs from what's mentioned in the comment. Which one are we finalizing?

"mode": "enforce",
"in": "crop_configurations",
"field": "optimal_temperature",
"compare_field": "minimum_temperature",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"compare_field": "minimum_temperature",
"field_to_compare": "minimum_temperature",

"in": "crop_configurations",
"field": "optimal_temperature",
"compare_field": "minimum_temperature",
"operator": "greater_or_equal_to"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"operator": "greater_or_equal_to"
"relation_to_check": "greater_or_equal_to"

Or we change the operator to something like check_greater_or_equal_to.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like changes from other PRs

Comment thread RUFAS/data_validator.py
expression_block["aggregation"], eager_termination, relationship
)
else:
raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}")
raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}. Supported blocks are "aggregation" and "for_each". ")

]
"aggregation": {
"function": "no_op",
"mode": "aggregate",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to switch from "operation"?

Comment thread RUFAS/data_validator.py
operator: str = iter_block.get("operator", "")
mode: str = iter_block.get("mode", "enforce")

compare_fn = self.relation_mapping.get(operator)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compare_fn = self.relation_mapping.get(operator)
compare_function = self.relation_mapping.get(operator)

Comment thread RUFAS/data_validator.py
Comment on lines +1916 to +1924
if "for_each" in expression_block:
result, evaluated = self._evaluate_iterate_array_of_dicts(
expression_block["for_each"], eager_termination, relationship
)

Notes
-----
Expression block:
>>> {
... "operation": "sum | difference | average | product | no_op", # optional, defaults to "no_op"
... "apply_to": "individual | group", # optional
... "ordered_variables": ["alias_0", "alias_1"],
... "save_as": "alias_2" # optional
... }
elif "aggregation" in expression_block:
result, evaluated = self._evaluate_aggregation_block(
expression_block["aggregation"], eager_termination, relationship
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the arguments are very similar, I suggest we use function mapping here to improve clarity and extendability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants