Skip to content

Conversation

@lyne7-sc
Copy link
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

The current array_repeat implementation built intermediate arrays row by row using MutableArrayData + concat. This caused unnecessary allocations and per-row overhead.

This PR replaces it with a vectorized implementation based on offsets + take, which constructs the output in a single pass and reduces intermediate memory usage.

What changes are included in this PR?

Reimplemented general_repeat and general_list_repeat using take, eliminating per-row materialization.

Benchmarks

group                                               after                                   before
-----                                               -----                                   ------
array_repeat_boolean/repeat_50_count/100            1.00     18.1±0.59µs        ? ?/sec     5.92    106.8±3.29µs        ? ?/sec
array_repeat_boolean/repeat_50_count/1000           1.00    100.0±8.00µs        ? ?/sec     9.34   934.2±20.43µs        ? ?/sec
array_repeat_boolean/repeat_50_count/10000          1.00   882.5±18.07µs        ? ?/sec     10.45     9.2±0.16ms        ? ?/sec
array_repeat_boolean/repeat_5_count/100             1.00     10.3±0.58µs        ? ?/sec     6.82     70.6±2.74µs        ? ?/sec
array_repeat_boolean/repeat_5_count/1000            1.00     20.5±1.05µs        ? ?/sec     28.07  574.6±20.67µs        ? ?/sec
array_repeat_boolean/repeat_5_count/10000           1.00    119.1±6.41µs        ? ?/sec     47.08     5.6±0.12ms        ? ?/sec
array_repeat_float64/repeat_50_count/100            1.00     15.0±0.34µs        ? ?/sec     6.47     96.9±4.63µs        ? ?/sec
array_repeat_float64/repeat_50_count/1000           1.00     68.6±2.91µs        ? ?/sec     12.23  839.5±29.00µs        ? ?/sec
array_repeat_float64/repeat_50_count/10000          1.00   700.7±29.21µs        ? ?/sec     12.21     8.6±0.21ms        ? ?/sec
array_repeat_float64/repeat_5_count/100             1.00      9.7±0.27µs        ? ?/sec     7.12     69.2±1.75µs        ? ?/sec
array_repeat_float64/repeat_5_count/1000            1.00     18.2±0.81µs        ? ?/sec     30.75  558.7±19.02µs        ? ?/sec
array_repeat_float64/repeat_5_count/10000           1.00     98.8±8.05µs        ? ?/sec     55.35     5.5±0.23ms        ? ?/sec
array_repeat_int64/repeat_50_count/100              1.00     15.5±0.70µs        ? ?/sec     6.46     99.8±2.94µs        ? ?/sec
array_repeat_int64/repeat_50_count/1000             1.00     69.3±3.29µs        ? ?/sec     12.42  860.6±14.99µs        ? ?/sec
array_repeat_int64/repeat_50_count/10000            1.00   672.0±20.81µs        ? ?/sec     12.54     8.4±0.14ms        ? ?/sec
array_repeat_int64/repeat_5_count/100               1.00      9.6±0.51µs        ? ?/sec     6.03     57.7±1.94µs        ? ?/sec
array_repeat_int64/repeat_5_count/1000              1.00     17.9±0.73µs        ? ?/sec     32.14  574.2±22.44µs        ? ?/sec
array_repeat_int64/repeat_5_count/10000             1.00     99.2±6.90µs        ? ?/sec     55.65     5.5±0.20ms        ? ?/sec
array_repeat_nested_int64/repeat_50_count/100       1.00     53.6±2.21µs        ? ?/sec     2.67    143.0±5.55µs        ? ?/sec
array_repeat_nested_int64/repeat_50_count/1000      1.00   474.7±30.74µs        ? ?/sec     2.95  1398.8±40.50µs        ? ?/sec
array_repeat_nested_int64/repeat_50_count/10000     1.00     14.3±0.41ms        ? ?/sec     1.39     19.9±2.98ms        ? ?/sec
array_repeat_nested_int64/repeat_5_count/100        1.00     14.2±0.49µs        ? ?/sec     6.62     93.8±3.66µs        ? ?/sec
array_repeat_nested_int64/repeat_5_count/1000       1.00     57.1±1.81µs        ? ?/sec     14.09  803.8±18.81µs        ? ?/sec
array_repeat_nested_int64/repeat_5_count/10000      1.00   509.5±27.43µs        ? ?/sec     18.27     9.3±0.36ms        ? ?/sec
array_repeat_nested_string/repeat_50_count/100      1.00    138.6±5.27µs        ? ?/sec    1.60   221.4±10.37µs        ? ?/sec
array_repeat_nested_string/repeat_50_count/1000     1.00  1300.5±49.34µs        ? ?/sec    1.70      2.2±0.05ms        ? ?/sec
array_repeat_nested_string/repeat_50_count/10000    1.00     28.6±1.21ms        ? ?/sec    1.06     30.3±1.91ms        ? ?/sec
array_repeat_nested_string/repeat_5_count/100       1.00     24.2±1.61µs        ? ?/sec    4.37    106.1±3.95µs        ? ?/sec
array_repeat_nested_string/repeat_5_count/1000      1.00    144.5±4.22µs        ? ?/sec    7.04  1016.5±39.20µs        ? ?/sec
array_repeat_nested_string/repeat_5_count/10000     1.00  1471.1±60.22µs        ? ?/sec    8.11     11.9±0.60ms        ? ?/sec
array_repeat_string/repeat_50_count/100             1.00     30.9±2.19µs        ? ?/sec     4.52    139.8±5.38µs        ? ?/sec
array_repeat_string/repeat_50_count/1000            1.00    231.5±4.96µs        ? ?/sec     5.25  1214.5±23.72µs        ? ?/sec
array_repeat_string/repeat_50_count/10000           1.00      4.0±0.12ms        ? ?/sec     3.18     12.9±0.49ms        ? ?/sec
array_repeat_string/repeat_5_count/100              1.00     11.5±0.35µs        ? ?/sec     6.47     74.5±1.81µs        ? ?/sec
array_repeat_string/repeat_5_count/1000             1.00     35.6±1.21µs        ? ?/sec     16.66  593.5±14.59µs        ? ?/sec
array_repeat_string/repeat_5_count/10000            1.00   284.3±26.10µs        ? ?/sec     21.21     6.0±0.14ms        ? ?/sec

Are these changes tested?

Yes. Existing SLT tests pass without modification.

Are there any user-facing changes?

No.

let mut outer_offsets = Vec::with_capacity(counts.len() + 1);
outer_offsets.push(O::zero());
let mut running_offset = 0usize;
for &count in counts.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use OffsetBuffer::from_lengths here I believe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using OffsetBuffer::from_lengths cause an extra loop over the lengths array internally?
I'm a bit concerned this could introduce some performance overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we already have to iterate over counts, which is equivalent to what from_lengths is doing I believe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread that, the outer offsets can indeed be replaced with from_lengths, which is clearer. Thanks!

let data_type = list_array.data_type();
let value_type = list_array.value_type();
let mut new_values = vec![];
let counts = count_array.values();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if an element of count_array is null? It looks like we still access the underlying primitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will produce an empty array. This function seems to assume count[i] >= 0 and doesn’t handle count[i] being null. This also confused me, but I kept the original behavior since I’m not sure whether existing users rely on null count being treated as empty array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need a followup issue if this is the existing behaviour; we can't rely on null slots being 0 value (potential bug), moreover I feel we should be consistent with null semantics (usually a null input would return null 🤔)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I’ve opened a follow-up issue to track this #20075.

Since this could potentially introduce a user-facing behavioral change around null semantics, I’d prefer to keep this PR focused on the performance improvement. I can address the null-handling semantics separately in a follow-up PR.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants