Skip to content

fix: reduce memory allocation overhead during partial aggregation ear…#22165

Open
ariel-miculas wants to merge 1 commit into
apache:mainfrom
ariel-miculas:fix-extra-memory-allocated-in-take-n
Open

fix: reduce memory allocation overhead during partial aggregation ear…#22165
ariel-miculas wants to merge 1 commit into
apache:mainfrom
ariel-miculas:fix-extra-memory-allocated-in-take-n

Conversation

@ariel-miculas
Copy link
Copy Markdown
Contributor

…ly emit

Which issue does this PR close?

Rationale for this change

When the partial hash aggregation (with GroupOrdering::None) cannot reserve additional memory, it calls self.emit(EmitTo::First(n), where n is the largest integer multiple of self.batch_size, meaning it's close to the length of group_values. Using drain + collect leads to an additional copy of all these n values, since collect allocates a new Vec and drain doesn't modify the original vector's capacity. To avoid copying the largest allocation, choose a strategy which always allocates the minimum between n and remaining.

What changes are included in this PR?

Are these changes tested?

Yes, added unit tests

Are there any user-facing changes?

No

…ly emit

When the partial hash aggregation (with GroupOrdering::None) cannot
reserve additional memory, it calls self.emit(EmitTo::First(n), where n
is the largest integer multiple of self.batch_size, meaning it's close
to the length of group_values.  Using drain + collect leads to an
additional copy of all these n values, since collect allocates a new Vec
and drain doesn't modify the original vector's capacity. To avoid
copying the largest allocation, choose a strategy which always allocates
the minimum between n and remaining.

Fixes apache#22164
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extra memory allocated during partial aggregation early emit during OOM handling

1 participant