Skip to content

fix(bench): avoid OOM in array_replace bench#22120

Merged
alamb merged 1 commit into
apache:mainfrom
kumarUjjawal:fix/array_expression_memory_issue
May 12, 2026
Merged

fix(bench): avoid OOM in array_replace bench#22120
alamb merged 1 commit into
apache:mainfrom
kumarUjjawal:fix/array_expression_memory_issue

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The array_replace benchmark allocated ~90GB of memory before running (3 × 100M Expr literals), causing OOM on normal machines.

What changes are included in this PR?

  • Reduce array_len in array_expression bench from 100_000_000 to 100_000.
  • Remove a broken assert_eq! (and its unused expected_array) that compared the ScalarFunction Expr returned by array_replace_all against the unmodified input — the two Expr trees are never equal. The OOM previously hid the failing assertion.

Are these changes tested?

Yes, cargo bench -p datafusion-functions-nested --bench array_expression passes locally

Are there any user-facing changes?

No

@github-actions github-actions Bot added the functions Changes to functions implementation label May 12, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 12, 2026

Thanks @kumarUjjawal

@alamb alamb added this pull request to the merge queue May 12, 2026
Merged via the queue into apache:main with commit a1b788c May 12, 2026
35 checks passed
gstvg pushed a commit to gstvg/arrow-datafusion that referenced this pull request May 14, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18447.

## Rationale for this change

The `array_replace` benchmark allocated ~90GB of memory before running
(3 × 100M `Expr` literals), causing OOM on normal machines.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Reduce `array_len` in `array_expression` bench from `100_000_000` to
`100_000`.
- Remove a broken `assert_eq!` (and its unused `expected_array`) that
compared the `ScalarFunction` `Expr` returned by `array_replace_all`
against the unmodified input — the two `Expr` trees are never equal. The
OOM previously hid the failing assertion.

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes, `cargo bench -p datafusion-functions-nested --bench
array_expression` passes locally

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

No

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_expression benchmark uses too much memory and is sigkill'd by the OS

2 participants