Skip to content

perf: Use NullBuffer::union_many#22070

Merged
mbutrovich merged 2 commits intoapache:mainfrom
neilconway:neilc/perf-null-buffer-union-many
May 8, 2026
Merged

perf: Use NullBuffer::union_many#22070
mbutrovich merged 2 commits intoapache:mainfrom
neilconway:neilc/perf-null-buffer-union-many

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Now that NullBuffer::union_many has landed in arrow-rs 58.2, we can use it to simplify and speedup various places in DataFusion that were union'ing together > 2 null buffers. In such cases, union_many avoids the intermediate allocations that are done by repeated union calls.

What changes are included in this PR?

  • Adopt NullBuffer::union_many throughout the codebase
  • Along the way, clean up compute_null_mask in the Spark crate to be more readable and avoid needlessly padding scalars out to arrays just to compute the null mask

Are these changes tested?

Yes, covered by existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added functions Changes to functions implementation spark labels May 7, 2026
@neilconway
Copy link
Copy Markdown
Contributor Author

There's one remaining place we could use union_many in the grouping code, but that is fixed separately in #22068 because there were other bugs in that code path.

@neilconway
Copy link
Copy Markdown
Contributor Author

FYI @mbutrovich

@mbutrovich mbutrovich self-requested a review May 7, 2026 21:16
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neilconway
Thanks for working on this.
Looks 👍 to me

@mbutrovich mbutrovich added this pull request to the merge queue May 8, 2026
Merged via the queue into apache:main with commit 53517af May 8, 2026
35 checks passed
@neilconway neilconway deleted the neilc/perf-null-buffer-union-many branch May 8, 2026 13:32
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 8, 2026

AMAZING!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use NullBuffer::union_many when appropriate

4 participants