[XV] Add support for complex data structure by allisterakun · Pull Request #2937 · RuminantFarmSystems/RuFaS

allisterakun · 2026-04-07T16:58:35Z

Add support for filtering/iterating list[dict]

New feature: `iterate_array_of_dicts` expression block + schema refactor

New expression block schema

Expression blocks (left_hand / right_hand) now require one of two mutually exclusive named sub-blocks. The flat format (operation, ordered_variables at the top level) is no longer supported.

Aggregation sub-block

{
  "aggregation": {
    "operation": "sum | difference | average | product | no_op",
    "ordered_variables": ["alias_0", "alias_1"],
    "apply_to": "individual | group"
  },
  "save_as": "result_alias"
}

Array-of-dicts sub-block

{
  "iterate_array_of_dicts": {
    "variable_name": "alias_for_list_of_dicts",
    "attribute_of_interest": "attribute_name",
    "comparison_value": "alias_for_comparison_value",
    "relationship": "equal | greater | ...",
    "filter_array": true
  },
  "save_as": "result_alias"
}

save_as is optional and always lives at the outer expression-block level in both forms.

`iterate_array_of_dicts` behaviour

`filter_array`	Result
`true`	Returns the filtered subset of `list[dict]` entries where `entry[attribute_of_interest] <relationship> comparison_value`
`false`	Returns `[true]` if all entries satisfy the condition, `[false]` otherwise

Changes to `data_validator.py`

_evaluate_expression refactored — now dispatches to _evaluate_aggregation_block or _evaluate_iterate_array_of_dicts based on which sub-block key is present; raises ValueError if neither is found
_evaluate_aggregation_block extracted — aggregation logic moved out of _evaluate_expression into its own method
_evaluate_iterate_array_of_dicts added — new method implementing the array-of-dicts filter/enforce feature

Bug fixes caught during review:

_evaluate_expression was hardcoded to return result, True, masking failures from sub-expressions — fixed to return result, evaluated
info_map entries in _evaluate_aggregation_block referenced _evaluate_expression.__name__ instead of _evaluate_aggregation_block.__name__
comparison_value fetched from the alias pool was passed raw to relation functions which expect a list — added if not isinstance(comparison_value, list): comparison_value = [comparison_value]

Changes to `example_cross_validation.json`

All existing rules updated to the new aggregation sub-block format
Typo attirbute_of_interest → attribute_of_interest fixed
Added a fourth example rule demonstrating iterate_array_of_dicts with filter_array: false

Changes to `tests/test_data_validator.py`

test_evaluate_expression_unknown_operation, test_evaluate_expression_no_ordered_variables, test_evaluate_expression_apply_to_individual, test_evaluate_expression_apply_to_group — all expression blocks updated to the new {"aggregation": {...}} format with save_as at the outer level
Added test_evaluate_iterate_array_of_dicts_success — parametrized over filter mode, enforce mode (all-pass and partial-fail)
Added test_evaluate_iterate_array_of_dicts_unknown_relationship_no_eager and _eager — error handling paths
Added test_evaluate_expression_with_iterate_array_of_dicts_and_save_as — verifies save_as is applied at the expression-block level when using iterate_array_of_dicts

Context

Issue(s) closed by this pull request: closes #

What

Why

How

Test plan

Input Changes

Output Changes

N/A

Filter

## Changes to cross-validation expression block schema ### New expression block structure Expression blocks (`left_hand` / `right_hand`) now require one of two mutually exclusive sub-blocks. The legacy flat format (`operation` and `ordered_variables` at the top level) is no longer supported. **Aggregation sub-block** ```json { "aggregation": { "operation": "sum | difference | average | product | no_op", "ordered_variables": ["alias_0", "alias_1"], "apply_to": "individual | group" }, "save_as": "result_alias" } ``` **Array-of-dicts sub-block** ```json { "iterate_array_of_dicts": { "variable_name": "alias_for_list_of_dicts", "attribute_of_interest": "attribute_name", "comparison_value": "alias_for_comparison_value", "relationship": "equal | greater | ...", "filter_array": true }, "save_as": "result_alias" } ``` `save_as` is optional and always lives at the outer expression-block level in both forms. --- ### `iterate_array_of_dicts` behaviour | `filter_array` | Result | |---|---| | `true` | Returns the subset of entries where `entry[attribute_of_interest] <relationship> comparison_value` | | `false` | Returns `[true]` if **all** entries satisfy the condition, `[false]` otherwise | --- ### Bug fixes in `data_validator.py` - **`_evaluate_expression` always returned success** — `return result, True` was changed to `return result, evaluated`. Previously, a failed aggregation or array-iteration returned `(None, True)`, bypassing the failure check in `_evaluate_condition`. - **Wrong function name in `info_map`** — Error log entries in `_evaluate_aggregation_block` referenced `_evaluate_expression.__name__` instead of `_evaluate_aggregation_block.__name__`. - **`comparison_value` not list-wrapped in `_evaluate_iterate_array_of_dicts`** — All relation-mapping functions compare two lists. A scalar value fetched from the alias pool (e.g. `"CALF"`) was passed raw, making every comparison silently return `false`. Now wrapped: `if not isinstance(comparison_value, list): comparison_value = [comparison_value]`. --- ### Test updates (`tests/test_data_validator.py`) - Removed `test_extract_aggregation_block` — `_extract_aggregation_block` was deleted - Removed `test_evaluate_expression_nested_aggregation_format` — now redundant - Updated `test_evaluate_expression_unknown_operation`, `test_evaluate_expression_no_ordered_variables`, `test_evaluate_expression_apply_to_individual`, `test_evaluate_expression_apply_to_group` — expression blocks wrapped in `{"aggregation": {...}}` with `save_as` moved to the outer level - Added tests for `_evaluate_iterate_array_of_dicts`: filter mode, enforce mode, unknown relationship (eager and non-eager), and `save_as` propagation through `_evaluate_expression`

allisterakun · 2026-04-07T17:01:14Z

Commit 6767190 has the following changes:

New feature: `iterate_array_of_dicts` expression block + schema refactor

New expression block schema

Expression blocks (left_hand / right_hand) now require one of two mutually exclusive named sub-blocks. The flat format (operation, ordered_variables at the top level) is no longer supported.

Aggregation sub-block

{
  "aggregation": {
    "operation": "sum | difference | average | product | no_op",
    "ordered_variables": ["alias_0", "alias_1"],
    "apply_to": "individual | group"
  },
  "save_as": "result_alias"
}

Array-of-dicts sub-block

{
  "iterate_array_of_dicts": {
    "variable_name": "alias_for_list_of_dicts",
    "attribute_of_interest": "attribute_name",
    "comparison_value": "alias_for_comparison_value",
    "relationship": "equal | greater | ...",
    "filter_array": true
  },
  "save_as": "result_alias"
}

save_as is optional and always lives at the outer expression-block level in both forms.

`iterate_array_of_dicts` behaviour

`filter_array`	Result
`true`	Returns the filtered subset of `list[dict]` entries where `entry[attribute_of_interest] <relationship> comparison_value`
`false`	Returns `[true]` if all entries satisfy the condition, `[false]` otherwise

Changes to `data_validator.py`

_evaluate_expression refactored — now dispatches to _evaluate_aggregation_block or _evaluate_iterate_array_of_dicts based on which sub-block key is present; raises ValueError if neither is found
_evaluate_aggregation_block extracted — aggregation logic moved out of _evaluate_expression into its own method
_evaluate_iterate_array_of_dicts added — new method implementing the array-of-dicts filter/enforce feature

Bug fixes caught during review:

_evaluate_expression was hardcoded to return result, True, masking failures from sub-expressions — fixed to return result, evaluated
info_map entries in _evaluate_aggregation_block referenced _evaluate_expression.__name__ instead of _evaluate_aggregation_block.__name__
comparison_value fetched from the alias pool was passed raw to relation functions which expect a list — added if not isinstance(comparison_value, list): comparison_value = [comparison_value]

Changes to `example_cross_validation.json`

All existing rules updated to the new aggregation sub-block format
Typo attirbute_of_interest → attribute_of_interest fixed
Added a fourth example rule demonstrating iterate_array_of_dicts with filter_array: false

Changes to `tests/test_data_validator.py`

test_evaluate_expression_unknown_operation, test_evaluate_expression_no_ordered_variables, test_evaluate_expression_apply_to_individual, test_evaluate_expression_apply_to_group — all expression blocks updated to the new {"aggregation": {...}} format with save_as at the outer level
Added test_evaluate_iterate_array_of_dicts_success — parametrized over filter mode, enforce mode (all-pass and partial-fail)
Added test_evaluate_iterate_array_of_dicts_unknown_relationship_no_eager and _eager — error handling paths
Added test_evaluate_expression_with_iterate_array_of_dicts_and_save_as — verifies save_as is applied at the expression-block level when using iterate_array_of_dicts

github-actions · 2026-04-07T17:04:51Z

Current Coverage: 99%

Mypy errors on new_xv_feature branch: 1191
Mypy errors on dev branch: 1191
No difference in error counts

github-actions · 2026-04-07T17:04:52Z

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.

github-actions · 2026-04-07T19:35:53Z

Current Coverage: 99%

Mypy errors on new_xv_feature branch: 1191
Mypy errors on dev branch: 1191
No difference in error counts

github-actions · 2026-04-07T19:35:53Z

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.

…w_ed [Animal][Reproduction] Refactor `Reproduction.execute_cow_ed_protocol()`

github-actions · 2026-04-10T14:44:18Z

Current Coverage: 99%

Mypy errors on new_xv_feature branch: 1191
Mypy errors on dev branch: 1191
No difference in error counts

github-actions · 2026-04-10T14:44:19Z

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.

github-actions · 2026-04-13T14:20:05Z

Current Coverage: %

Mypy errors on new_xv_feature branch: 1194
Mypy errors on dev branch: 1191
3 more errors on new_xv_feature branch

github-actions · 2026-04-13T14:20:05Z

🚨 Please update the changelog. This PR cannot be merged until changelog.md is updated.
🚨 Flake8 linting errors were found. Please fix the linting issues.
🚨 Some tests have failed.

matthew7838

Seems like a good start, great work, Allister. This PR should be merged with a careful/detailed update to documentation in/out of the code. The example makes sense to me, but I'm not fully understanding how this will be used to incorporate the "reverse lookup"; an example could be helpful.

matthew7838 · 2026-04-15T06:08:53Z

    {
      "description": "Number of stalls in growing pens",
-      "target_and_save": {
+      "aliases": {


This is a good naming change.

matthew7838 · 2026-04-15T06:11:36Z

+            }
          },
-          "relationship": "equal"
+          "operator": "equal"


My two cents are that relationship is the more intuitive choice here.

matthew7838 · 2026-04-15T06:24:40Z

+            "for_each": {
+              "mode": "enforce",
+              "in": "crop_configurations",
+              "field": "optimal_temperature",
+              "compare_field": "minimum_temperature",
+              "operator": "greater_or_equal_to"


This naming differs from what's mentioned in the comment. Which one are we finalizing?

matthew7838 · 2026-04-15T06:25:17Z

+              "mode": "enforce",
+              "in": "crop_configurations",
+              "field": "optimal_temperature",
+              "compare_field": "minimum_temperature",


Suggested change

"compare_field": "minimum_temperature",

"field_to_compare": "minimum_temperature",

matthew7838 · 2026-04-15T06:28:37Z

+              "in": "crop_configurations",
+              "field": "optimal_temperature",
+              "compare_field": "minimum_temperature",
+              "operator": "greater_or_equal_to"


Suggested change

"operator": "greater_or_equal_to"

"relation_to_check": "greater_or_equal_to"

Or we change the operator to something like check_greater_or_equal_to.

matthew7838 · 2026-04-15T06:30:10Z

This seems like changes from other PRs

matthew7838 · 2026-04-15T06:35:12Z

+                expression_block["aggregation"], eager_termination, relationship
+            )
+        else:
+            raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}")


Suggested change

raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}")

raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}. Supported blocks are "aggregation" and "for_each". ")

matthew7838 · 2026-04-15T06:42:31Z

-            ]
+            "aggregation": {
+              "function": "no_op",
+              "mode": "aggregate",


What's the reason to switch from "operation"?

matthew7838 · 2026-04-15T06:48:48Z

+        operator: str = iter_block.get("operator", "")
+        mode: str = iter_block.get("mode", "enforce")
+
+        compare_fn = self.relation_mapping.get(operator)


Suggested change

compare_fn = self.relation_mapping.get(operator)

compare_function = self.relation_mapping.get(operator)

matthew7838 · 2026-04-15T06:56:59Z

+        if "for_each" in expression_block:
+            result, evaluated = self._evaluate_iterate_array_of_dicts(
+                expression_block["for_each"], eager_termination, relationship
+            )

-        Notes
-        -----
-        Expression block:
-        >>> {
-        ...  "operation": "sum | difference | average | product | no_op", # optional, defaults to "no_op"
-        ...  "apply_to": "individual | group", # optional
-        ...  "ordered_variables": ["alias_0", "alias_1"],
-        ...  "save_as": "alias_2" # optional
-        ... }
+        elif "aggregation" in expression_block:
+            result, evaluated = self._evaluate_aggregation_block(
+                expression_block["aggregation"], eager_termination, relationship
+            )


Since the arguments are very similar, I suggest we use function mapping here to improve clarity and extendability.

allisterakun and others added 12 commits April 6, 2026 20:01

Animal][Reproduction] Refactor Reproduction.execute_cow_ed_protocol()

9784709

Update changelog.md

d66e53e

Merge d66e53e into 472a3fe

ad93ddf

Apply Black Formatting

ac804b0

Update badges on README

30d36c2

flake8

5f47519

Merge 5f47519 into 472a3fe

74a3a2c

Apply Black Formatting

8d77836

Update badges on README

e2a85ee

Merge 6767190 into 5334967

f1251a4

Apply Black Formatting

683f125

allisterakun and others added 3 commits April 7, 2026 15:29

Update example_cross_validation.json

eae6de9

Merge eae6de9 into 5334967

ec7367f

Apply Black Formatting

6477821

allisterakun and others added 7 commits April 8, 2026 10:19

Merge branch 'dev' into refactor_execute_cow_ed

445bed0

Merge 445bed0 into 5334967

2f5a2fe

Apply Black Formatting

4b8f15f

Merge pull request #2934 from RuminantFarmSystems/refactor_execute_co…

9606fad

…w_ed [Animal][Reproduction] Refactor `Reproduction.execute_cow_ed_protocol()`

Update example_cross_validation.json

df3d7ca

Merge df3d7ca into 9606fad

53ea4f6

Apply Black Formatting

6a3bc7e

updated

71d3004

allisterakun and others added 3 commits April 13, 2026 14:14

Merge 71d3004 into 9606fad

12b67f0

Apply Black Formatting

8ac064a

Update badges on README

5c905db

allisterakun requested review from ew3361zh and matthew7838 and removed request for ew3361zh April 14, 2026 13:30

matthew7838 reviewed Apr 15, 2026

View reviewed changes

	"compare_field": "minimum_temperature",
	"field_to_compare": "minimum_temperature",

	"operator": "greater_or_equal_to"
	"relation_to_check": "greater_or_equal_to"

	raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}")
	raise ValueError(f"Cross-validation error: Unknown expression block: {expression_block}. Supported blocks are "aggregation" and "for_each". ")

	compare_fn = self.relation_mapping.get(operator)
	compare_function = self.relation_mapping.get(operator)

Conversation

allisterakun commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New feature: iterate_array_of_dicts expression block + schema refactor

New expression block schema

iterate_array_of_dicts behaviour

Changes to data_validator.py

Changes to example_cross_validation.json

Changes to tests/test_data_validator.py

Context

What

Why

How

Test plan

Input Changes

Output Changes

Filter

Uh oh!

allisterakun commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New feature: iterate_array_of_dicts expression block + schema refactor

New expression block schema

iterate_array_of_dicts behaviour

Changes to data_validator.py

Changes to example_cross_validation.json

Changes to tests/test_data_validator.py

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

matthew7838 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

allisterakun commented Apr 7, 2026 •

edited

Loading

New feature: `iterate_array_of_dicts` expression block + schema refactor

`iterate_array_of_dicts` behaviour

Changes to `data_validator.py`

Changes to `example_cross_validation.json`

Changes to `tests/test_data_validator.py`

allisterakun commented Apr 7, 2026 •

edited

Loading

New feature: `iterate_array_of_dicts` expression block + schema refactor

`iterate_array_of_dicts` behaviour

Changes to `data_validator.py`

Changes to `example_cross_validation.json`

Changes to `tests/test_data_validator.py`