Optimize arrow bytes view map #20055

Tushar7012 · 2026-01-28T22:30:23Z

Rationale for this change

The ArrowBytesViewMap is a high-performance structure used for interning string and binary views in operators like GROUP BY and COUNT DISTINCT. This PR addresses performance bottlenecks identified during maintainer review:

Redundant Memory Access: Previously, the input value (values.value(i)) was fetched and converted to a byte slice multiple times during hash collisions and insertion. This added unnecessary overhead to the interning loop.
Suboptimal Comparison: Inlined values (strings ≤ 12 bytes) were being converted to byte slices for comparison rather than leveraging direct u128 bitwise equality.

What changes are included in this PR?

Direct u128 View Comparison: Updated the interning logic to perform bitwise equality checks on the u128 views for inlined values, providing a single-cycle fast path.
"Single Fetch" Optimization: Refactored insert_if_new_inner to fetch and convert the input value exactly once per row. This value is reused for both lookup and insertion branches.
Optimized Comparison Sequence:
1. Direct hash match.
2. Bitwise u128 view equality for inlined strings (≤12 bytes).
3. Prefix comparison for larger strings before falling back to full byte-slice comparison.
Generic Type Resolution: Fixed compilation errors by correctly leveraging GenericByteViewArray<B> to provide specialized access for StringView/BinaryView types.

Are these changes tested?

Yes. I have verified the changes using the existing unit test suite:

cargo test -p datafusion-physical-expr-common --lib binary_view_map::tests
Result: 8 passed; 0 failed

Are there any user-facing changes?

No. This is a internal performance optimization.

- Use values.views() instead of values.iter() for direct u128 access - Use is_valid(i) for efficient null checking via validity bitmap - Avoid dereferencing overhead for inline strings - No additional memory overhead in Entry struct Closes apache#19961

Following review feedback, changed from usize to u32 for the builder index field. This saves 4 bytes per entry on 64-bit systems while still supporting up to 4 billion distinct values - more than enough for practical workloads.

Per reviewer feedback, replaced Vec<bool> with BooleanBufferBuilder for tracking null values. This uses bit-packed storage (1 bit per value) instead of byte-per-value, reducing memory usage by 8x for the null bitmap. Also fixed clippy warnings for mem_replace_with_default.

@Dandandan

…etch This change optimizes the interning process in ArrowBytesViewMap by:\n1. Performing direct u128 view comparisons for inlined values.\n2. Fetching the input value only once per row to avoid redundant work during hash collisions and insertion.\n3. Addresses feedback from @Dandandan regarding redundant value fetches.

…-fetch optimization

Copilot

Pull request overview

This PR optimizes the ArrowBytesViewMap by replacing the GenericByteViewBuilder with manual buffer management, enabling direct u128 view comparisons for inline strings and reducing redundant memory access during hash collisions.

Changes:

Replaced GenericByteViewBuilder with manual buffer management (views, in_progress, completed, nulls vectors)
Implemented direct u128 bitwise equality checks for inline strings (≤12 bytes) as a fast path
Refactored comparison logic to use view-based comparisons before falling back to byte-slice comparison

Comments suppressed due to low confidence (5)

datafusion/physical-expr-common/src/binary_view_map.rs:447

For accurate memory accounting, the nulls size calculation should use self.nulls.capacity() * size_of::<bool>() to match the pattern used for in_progress_size at line 445 and to account for actual allocated capacity rather than just used length. Similarly, views_size at line 444 should consider using capacity for consistency, though this depends on whether the intent is to track allocated or used memory.

        let views_size = self.views.len() * size_of::<u128>();
        let in_progress_size = self.in_progress.capacity();
        let completed_size: usize = self.completed.iter().map(|b| b.len()).sum();
        let nulls_size = self.nulls.len();

datafusion/physical-expr-common/src/binary_view_map.rs:117

The constant BYTE_VIEW_MAX_BLOCK_SIZE is defined in an unusual location between the documentation comment and the struct definition. For better code organization and readability, this constant should be moved to the module level (e.g., after the imports around line 30) or before the struct's documentation comment. This follows Rust conventions where constants are typically defined at the module level or clearly separated from type definitions.

/// Max size of the in-progress buffer before flushing to completed buffers
const BYTE_VIEW_MAX_BLOCK_SIZE: usize = 2 * 1024 * 1024;

datafusion/physical-expr-common/src/binary_view_map.rs:386

The unsafe call to BinaryViewArray::new_unchecked lacks a SAFETY comment explaining why it's safe. Consider adding a comment explaining that: (1) all views are correctly constructed via make_view to point to valid data in the buffers, (2) buffer indices and offsets are valid by construction through append_value, and (3) the null buffer has the correct length matching the views. This is separate from the SAFETY comment for to_string_view_unchecked at line 391.

        let array =
            unsafe { BinaryViewArray::new_unchecked(views, self.completed, null_buffer) };

datafusion/physical-expr-common/src/binary_view_map.rs:415

The cast to u32 at line 414 could theoretically overflow if self.completed.len() exceeds u32::MAX. While this is extremely unlikely in practice (would require billions of 2MB buffers), consider adding a debug assertion like debug_assert!(self.completed.len() <= u32::MAX as usize, "Too many completed buffers") to catch this during development. The same applies to the offset cast at line 415.

            let buffer_index = self.completed.len() as u32;
            let offset = self.in_progress.len() as u32;

datafusion/physical-expr-common/src/binary_view_map.rs:382

The null buffer construction iterates over self.nulls twice: once to check if any nulls exist, and once to invert the boolean values. Consider optimizing by inverting and collecting in a single pass, then checking if all values are true to decide whether to create a NullBuffer. However, this is a minor performance consideration and may not be worth the code complexity.

        let null_buffer = if self.nulls.iter().any(|&is_null| is_null) {
            Some(NullBuffer::from(
                self.nulls
                    .iter()
                    .map(|&is_null| !is_null)
                    .collect::<Vec<_>>(),
            ))
        } else {
            None
        };

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-28T22:35:18Z

datafusion/physical-expr-common/src/binary_view_map.rs

                        } else {
                            &in_progress[offset..offset + stored_len]


The condition checking which buffer to read from should be more defensive. If buffer_index > completed.len(), this would incorrectly try to access in_progress with potentially invalid offsets. While this shouldn't happen with the current append logic, consider adding an assertion or explicit error handling to catch any future bugs. For example: else if buffer_index == completed.len() followed by else { panic!("Invalid buffer_index") }.

Suggested change

} else {

&in_progress[offset..offset + stored_len]

} else if buffer_index == completed.len() {

&in_progress[offset..offset + stored_len]

} else {

panic!("Invalid buffer_index: {}", buffer_index);

datafusion/physical-expr-common/src/binary_view_map.rs

Dandandan · 2026-01-29T04:42:22Z

run benchmarks

alamb-ghbot · 2026-01-29T04:42:32Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (6a73668) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

Dandandan · 2026-01-29T04:51:53Z

Run benchmarks

…y work for inline strings

alamb-ghbot · 2026-01-29T14:08:32Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (f105b53) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb-ghbot · 2026-01-29T14:48:28Z

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2480.89 ms │                    2428.81 ms │     no change │
│ QQuery 1 │   944.90 ms │                     843.83 ms │ +1.12x faster │
│ QQuery 2 │  1889.05 ms │                    1725.28 ms │ +1.09x faster │
│ QQuery 3 │  1052.65 ms │                    1070.28 ms │     no change │
│ QQuery 4 │  2198.62 ms │                    2194.78 ms │     no change │
│ QQuery 5 │ 28637.17 ms │                   28719.42 ms │     no change │
│ QQuery 6 │  4076.90 ms │                    4049.42 ms │     no change │
│ QQuery 7 │  2797.94 ms │                    2927.17 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 44078.13ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 43958.99ms │
│ Average Time (HEAD)                          │  5509.77ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5494.87ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.62 ms │                       2.69 ms │     no change │
│ QQuery 1  │    52.73 ms │                      54.12 ms │     no change │
│ QQuery 2  │   133.34 ms │                     136.30 ms │     no change │
│ QQuery 3  │   140.26 ms │                     159.87 ms │  1.14x slower │
│ QQuery 4  │   971.39 ms │                    1078.28 ms │  1.11x slower │
│ QQuery 5  │  1254.86 ms │                    1345.20 ms │  1.07x slower │
│ QQuery 6  │     9.45 ms │                       8.82 ms │ +1.07x faster │
│ QQuery 7  │    57.41 ms │                      58.04 ms │     no change │
│ QQuery 8  │  1351.61 ms │                    1445.92 ms │  1.07x slower │
│ QQuery 9  │  1700.21 ms │                    1823.44 ms │  1.07x slower │
│ QQuery 10 │   346.95 ms │                     355.66 ms │     no change │
│ QQuery 11 │   392.14 ms │                     406.56 ms │     no change │
│ QQuery 12 │  1201.48 ms │                    1273.44 ms │  1.06x slower │
│ QQuery 13 │  1913.26 ms │                    2027.02 ms │  1.06x slower │
│ QQuery 14 │  1222.62 ms │                    1260.92 ms │     no change │
│ QQuery 15 │  1146.42 ms │                    1228.10 ms │  1.07x slower │
│ QQuery 16 │  2463.52 ms │                    2557.65 ms │     no change │
│ QQuery 17 │  2435.25 ms │                    2502.09 ms │     no change │
│ QQuery 18 │  6631.66 ms │                    5006.87 ms │ +1.32x faster │
│ QQuery 19 │   124.39 ms │                     126.71 ms │     no change │
│ QQuery 20 │  1961.87 ms │                    1927.78 ms │     no change │
│ QQuery 21 │  2252.49 ms │                    2241.97 ms │     no change │
│ QQuery 22 │  8076.86 ms │                    3900.91 ms │ +2.07x faster │
│ QQuery 23 │ 22553.78 ms │                   12479.56 ms │ +1.81x faster │
│ QQuery 24 │   225.99 ms │                     218.63 ms │     no change │
│ QQuery 25 │   472.84 ms │                     477.00 ms │     no change │
│ QQuery 26 │   226.54 ms │                     225.63 ms │     no change │
│ QQuery 27 │  2766.21 ms │                    2693.38 ms │     no change │
│ QQuery 28 │ 25519.10 ms │                   23768.97 ms │ +1.07x faster │
│ QQuery 29 │   988.07 ms │                     961.67 ms │     no change │
│ QQuery 30 │  1254.70 ms │                    1296.86 ms │     no change │
│ QQuery 31 │  1346.86 ms │                    1352.48 ms │     no change │
│ QQuery 32 │  5152.56 ms │                    4326.84 ms │ +1.19x faster │
│ QQuery 33 │  6259.69 ms │                    5556.50 ms │ +1.13x faster │
│ QQuery 34 │  5972.95 ms │                    5839.08 ms │     no change │
│ QQuery 35 │  1863.14 ms │                    1944.44 ms │     no change │
│ QQuery 36 │   195.38 ms │                     200.12 ms │     no change │
│ QQuery 37 │    82.91 ms │                      81.56 ms │     no change │
│ QQuery 38 │   121.32 ms │                     122.68 ms │     no change │
│ QQuery 39 │   358.20 ms │                     375.62 ms │     no change │
│ QQuery 40 │    48.93 ms │                      46.61 ms │     no change │
│ QQuery 41 │    40.01 ms │                      41.85 ms │     no change │
│ QQuery 42 │    36.00 ms │                      36.74 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 111327.95ms │
│ Total Time (optimize-arrow-bytes-view-map)   │  92974.57ms │
│ Average Time (HEAD)                          │   2589.02ms │
│ Average Time (optimize-arrow-bytes-view-map) │   2162.20ms │
│ Queries Faster                               │           7 │
│ Queries Slower                               │           8 │
│ Queries with No Change                       │          28 │
│ Queries with Failure                         │           0 │
└──────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 101.37 ms │                     102.68 ms │    no change │
│ QQuery 2  │  34.02 ms │                      33.83 ms │    no change │
│ QQuery 3  │  40.25 ms │                      40.76 ms │    no change │
│ QQuery 4  │  31.05 ms │                      32.22 ms │    no change │
│ QQuery 5  │  90.06 ms │                      93.52 ms │    no change │
│ QQuery 6  │  20.87 ms │                      21.16 ms │    no change │
│ QQuery 7  │ 160.64 ms │                     166.19 ms │    no change │
│ QQuery 8  │  39.34 ms │                      42.78 ms │ 1.09x slower │
│ QQuery 9  │ 107.65 ms │                     107.30 ms │    no change │
│ QQuery 10 │  68.20 ms │                      69.09 ms │    no change │
│ QQuery 11 │  19.65 ms │                      19.18 ms │    no change │
│ QQuery 12 │  52.30 ms │                      52.23 ms │    no change │
│ QQuery 13 │  49.46 ms │                      51.41 ms │    no change │
│ QQuery 14 │  15.77 ms │                      15.82 ms │    no change │
│ QQuery 15 │  31.03 ms │                      30.90 ms │    no change │
│ QQuery 16 │  27.88 ms │                      28.99 ms │    no change │
│ QQuery 17 │ 146.52 ms │                     146.41 ms │    no change │
│ QQuery 18 │ 296.43 ms │                     295.10 ms │    no change │
│ QQuery 19 │  39.65 ms │                      40.26 ms │    no change │
│ QQuery 20 │  56.79 ms │                      58.93 ms │    no change │
│ QQuery 21 │ 184.65 ms │                     194.60 ms │ 1.05x slower │
│ QQuery 22 │  22.56 ms │                      22.90 ms │    no change │
└───────────┴───────────┴───────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1636.14ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1666.27ms │
│ Average Time (HEAD)                          │   74.37ms │
│ Average Time (optimize-arrow-bytes-view-map) │   75.74ms │
│ Queries Faster                               │         0 │
│ Queries Slower                               │         2 │
│ Queries with No Change                       │        20 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

alamb-ghbot · 2026-01-29T14:48:31Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (f105b53) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb-ghbot · 2026-01-29T15:14:32Z

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2445.04 ms │                    2536.19 ms │     no change │
│ QQuery 1 │   973.35 ms │                     882.22 ms │ +1.10x faster │
│ QQuery 2 │  1953.55 ms │                    1764.09 ms │ +1.11x faster │
│ QQuery 3 │  1041.14 ms │                    1072.32 ms │     no change │
│ QQuery 4 │  2199.48 ms │                    2265.90 ms │     no change │
│ QQuery 5 │ 29112.82 ms │                   29218.84 ms │     no change │
│ QQuery 6 │  4155.12 ms │                    4097.98 ms │     no change │
│ QQuery 7 │  2918.67 ms │                    2777.71 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 44799.16ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 44615.25ms │
│ Average Time (HEAD)                          │  5599.89ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5576.91ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.60 ms │                       2.99 ms │  1.15x slower │
│ QQuery 1  │    53.01 ms │                      55.05 ms │     no change │
│ QQuery 2  │   139.78 ms │                     140.74 ms │     no change │
│ QQuery 3  │   158.47 ms │                     159.48 ms │     no change │
│ QQuery 4  │  1040.17 ms │                    1121.20 ms │  1.08x slower │
│ QQuery 5  │  1306.75 ms │                    1380.96 ms │  1.06x slower │
│ QQuery 6  │     8.69 ms │                       9.23 ms │  1.06x slower │
│ QQuery 7  │    56.93 ms │                      63.44 ms │  1.11x slower │
│ QQuery 8  │  1402.00 ms │                    1461.28 ms │     no change │
│ QQuery 9  │  1717.86 ms │                    1838.35 ms │  1.07x slower │
│ QQuery 10 │   345.84 ms │                     369.34 ms │  1.07x slower │
│ QQuery 11 │   390.67 ms │                     414.89 ms │  1.06x slower │
│ QQuery 12 │  1239.75 ms │                    1308.45 ms │  1.06x slower │
│ QQuery 13 │  1956.47 ms │                    2049.41 ms │     no change │
│ QQuery 14 │  1254.50 ms │                    1311.13 ms │     no change │
│ QQuery 15 │  1204.39 ms │                    1238.96 ms │     no change │
│ QQuery 16 │  2518.90 ms │                    2594.79 ms │     no change │
│ QQuery 17 │  2510.70 ms │                    2566.38 ms │     no change │
│ QQuery 18 │  5043.63 ms │                    4989.71 ms │     no change │
│ QQuery 19 │   124.10 ms │                     127.82 ms │     no change │
│ QQuery 20 │  1951.70 ms │                    1966.30 ms │     no change │
│ QQuery 21 │  2263.21 ms │                    2237.19 ms │     no change │
│ QQuery 22 │  3866.46 ms │                    3877.37 ms │     no change │
│ QQuery 23 │ 20191.14 ms │                   12575.62 ms │ +1.61x faster │
│ QQuery 24 │   219.95 ms │                     240.60 ms │  1.09x slower │
│ QQuery 25 │   502.01 ms │                     512.03 ms │     no change │
│ QQuery 26 │   246.92 ms │                     223.60 ms │ +1.10x faster │
│ QQuery 27 │  3219.46 ms │                    2754.39 ms │ +1.17x faster │
│ QQuery 28 │ 25176.04 ms │                   23925.23 ms │     no change │
│ QQuery 29 │   984.53 ms │                     990.80 ms │     no change │
│ QQuery 30 │  1349.17 ms │                    1317.50 ms │     no change │
│ QQuery 31 │  1379.06 ms │                    1357.17 ms │     no change │
│ QQuery 32 │  3730.30 ms │                    4323.99 ms │  1.16x slower │
│ QQuery 33 │  5724.61 ms │                    5732.13 ms │     no change │
│ QQuery 34 │  6215.14 ms │                    6040.15 ms │     no change │
│ QQuery 35 │  1981.14 ms │                    1982.16 ms │     no change │
│ QQuery 36 │   210.71 ms │                     202.49 ms │     no change │
│ QQuery 37 │    82.66 ms │                      82.14 ms │     no change │
│ QQuery 38 │   131.44 ms │                     120.33 ms │ +1.09x faster │
│ QQuery 39 │   382.69 ms │                     368.06 ms │     no change │
│ QQuery 40 │    47.85 ms │                      46.00 ms │     no change │
│ QQuery 41 │    43.41 ms │                      41.78 ms │     no change │
│ QQuery 42 │    37.07 ms │                      36.12 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 102411.87ms │
│ Total Time (optimize-arrow-bytes-view-map)   │  94156.74ms │
│ Average Time (HEAD)                          │   2381.67ms │
│ Average Time (optimize-arrow-bytes-view-map) │   2189.69ms │
│ Queries Faster                               │           4 │
│ Queries Slower                               │          11 │
│ Queries with No Change                       │          28 │
│ Queries with Failure                         │           0 │
└──────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 108.85 ms │                     102.73 ms │ +1.06x faster │
│ QQuery 2  │  33.95 ms │                      32.87 ms │     no change │
│ QQuery 3  │  38.12 ms │                      41.74 ms │  1.09x slower │
│ QQuery 4  │  31.71 ms │                      32.29 ms │     no change │
│ QQuery 5  │  92.34 ms │                      93.83 ms │     no change │
│ QQuery 6  │  21.09 ms │                      20.95 ms │     no change │
│ QQuery 7  │ 167.40 ms │                     163.55 ms │     no change │
│ QQuery 8  │  41.36 ms │                      43.97 ms │  1.06x slower │
│ QQuery 9  │ 110.24 ms │                     111.75 ms │     no change │
│ QQuery 10 │  68.69 ms │                      71.41 ms │     no change │
│ QQuery 11 │  20.02 ms │                      27.77 ms │  1.39x slower │
│ QQuery 12 │  52.26 ms │                      66.88 ms │  1.28x slower │
│ QQuery 13 │  50.39 ms │                      59.94 ms │  1.19x slower │
│ QQuery 14 │  15.52 ms │                      18.91 ms │  1.22x slower │
│ QQuery 15 │  31.42 ms │                      36.88 ms │  1.17x slower │
│ QQuery 16 │  28.54 ms │                      34.12 ms │  1.20x slower │
│ QQuery 17 │ 149.16 ms │                     184.82 ms │  1.24x slower │
│ QQuery 18 │ 290.39 ms │                     324.05 ms │  1.12x slower │
│ QQuery 19 │  39.66 ms │                      50.56 ms │  1.27x slower │
│ QQuery 20 │  57.75 ms │                      56.14 ms │     no change │
│ QQuery 21 │ 194.32 ms │                     189.52 ms │     no change │
│ QQuery 22 │  23.37 ms │                      23.20 ms │     no change │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1666.56ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1787.90ms │
│ Average Time (HEAD)                          │   75.75ms │
│ Average Time (optimize-arrow-bytes-view-map) │   81.27ms │
│ Queries Faster                               │         1 │
│ Queries Slower                               │        11 │
│ Queries with No Change                       │        10 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

Dandandan · 2026-01-29T16:05:04Z

run benchmarks

Dandandan · 2026-01-29T16:05:54Z

datafusion/physical-expr-common/src/binary_view_map.rs

            let len = view_u128 as u32;
-
            // Check if value already exists
+            let mut value: Option<&[u8]> = None;


This seems not right?

Good catch — you’re right to flag this.

The intent here was to defer materializing the byte slice until we know it’s needed, but the current structure makes that unclear and introduces unnecessary indirection.

I’ll refactor this so the slice is explicitly materialized once (at the top of the loop) and then reused consistently for both lookup and insertion paths. That should make the logic clearer and align better with the original optimization goal.

Dandandan · 2026-01-29T16:06:27Z

datafusion/physical-expr-common/src/binary_view_map.rs

            } else {
                // no existing value, make a new one
-                let value: &[u8] = values.value(i).as_ref();
+                let value = value.unwrap_or_else(|| values.value(i).as_ref());


This still accesses it here?

Yes, you’re absolutely right — this still ends up calling values.value(i) in the insert path.

That contradicts the goal of fetching the value once per row. I’ll rework this to eagerly bind the slice to a local variable and thread that through both the lookup and insert branches so values.value(i) is only accessed a single time.

alamb-ghbot · 2026-01-29T16:37:05Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (f105b53) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb-ghbot · 2026-01-29T17:17:40Z

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2390.72 ms │                    2487.46 ms │     no change │
│ QQuery 1 │   951.77 ms │                     826.81 ms │ +1.15x faster │
│ QQuery 2 │  1832.70 ms │                    1697.49 ms │ +1.08x faster │
│ QQuery 3 │  1029.58 ms │                    1056.66 ms │     no change │
│ QQuery 4 │  2203.75 ms │                    2206.13 ms │     no change │
│ QQuery 5 │ 28798.52 ms │                   28670.86 ms │     no change │
│ QQuery 6 │  4084.26 ms │                    4081.19 ms │     no change │
│ QQuery 7 │  2770.67 ms │                    2774.05 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 44061.96ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 43800.65ms │
│ Average Time (HEAD)                          │  5507.74ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5475.08ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.64 ms │                       2.68 ms │     no change │
│ QQuery 1  │    54.68 ms │                      53.11 ms │     no change │
│ QQuery 2  │   135.68 ms │                     134.49 ms │     no change │
│ QQuery 3  │   153.43 ms │                     162.89 ms │  1.06x slower │
│ QQuery 4  │  1050.62 ms │                    1089.93 ms │     no change │
│ QQuery 5  │  1308.25 ms │                    1352.81 ms │     no change │
│ QQuery 6  │    10.90 ms │                       9.52 ms │ +1.15x faster │
│ QQuery 7  │    58.23 ms │                      56.99 ms │     no change │
│ QQuery 8  │  1439.89 ms │                    1433.85 ms │     no change │
│ QQuery 9  │  1846.96 ms │                    1831.40 ms │     no change │
│ QQuery 10 │   354.29 ms │                     361.44 ms │     no change │
│ QQuery 11 │   405.58 ms │                     404.62 ms │     no change │
│ QQuery 12 │  1246.32 ms │                    1287.55 ms │     no change │
│ QQuery 13 │  1941.48 ms │                    1975.75 ms │     no change │
│ QQuery 14 │  1268.29 ms │                    1256.32 ms │     no change │
│ QQuery 15 │  1221.27 ms │                    1218.46 ms │     no change │
│ QQuery 16 │  2592.15 ms │                    2582.85 ms │     no change │
│ QQuery 17 │  2548.07 ms │                    2531.45 ms │     no change │
│ QQuery 18 │  5594.86 ms │                    4971.89 ms │ +1.13x faster │
│ QQuery 19 │   125.75 ms │                     126.29 ms │     no change │
│ QQuery 20 │  1920.06 ms │                    1898.67 ms │     no change │
│ QQuery 21 │  2262.85 ms │                    2205.14 ms │     no change │
│ QQuery 22 │  3876.30 ms │                    3846.07 ms │     no change │
│ QQuery 23 │ 23526.78 ms │                   12329.21 ms │ +1.91x faster │
│ QQuery 24 │   233.06 ms │                     216.03 ms │ +1.08x faster │
│ QQuery 25 │   476.28 ms │                     487.90 ms │     no change │
│ QQuery 26 │   228.29 ms │                     222.29 ms │     no change │
│ QQuery 27 │  2825.10 ms │                    2698.31 ms │     no change │
│ QQuery 28 │ 25000.05 ms │                   23635.98 ms │ +1.06x faster │
│ QQuery 29 │   980.67 ms │                     964.77 ms │     no change │
│ QQuery 30 │  1303.02 ms │                    1255.09 ms │     no change │
│ QQuery 31 │  1384.04 ms │                    1345.53 ms │     no change │
│ QQuery 32 │  4856.95 ms │                    4440.78 ms │ +1.09x faster │
│ QQuery 33 │  5721.48 ms │                    5765.76 ms │     no change │
│ QQuery 34 │  6041.19 ms │                    5986.54 ms │     no change │
│ QQuery 35 │  1956.55 ms │                    1878.35 ms │     no change │
│ QQuery 36 │   203.02 ms │                     200.77 ms │     no change │
│ QQuery 37 │    81.99 ms │                      80.74 ms │     no change │
│ QQuery 38 │   122.96 ms │                     122.79 ms │     no change │
│ QQuery 39 │   368.81 ms │                     355.43 ms │     no change │
│ QQuery 40 │    46.15 ms │                      45.93 ms │     no change │
│ QQuery 41 │    40.06 ms │                      41.90 ms │     no change │
│ QQuery 42 │    36.27 ms │                      35.35 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 106851.28ms │
│ Total Time (optimize-arrow-bytes-view-map)   │  92903.62ms │
│ Average Time (HEAD)                          │   2484.91ms │
│ Average Time (optimize-arrow-bytes-view-map) │   2160.55ms │
│ Queries Faster                               │           6 │
│ Queries Slower                               │           1 │
│ Queries with No Change                       │          36 │
│ Queries with Failure                         │           0 │
└──────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 102.07 ms │                     102.47 ms │    no change │
│ QQuery 2  │  34.22 ms │                      33.67 ms │    no change │
│ QQuery 3  │  40.92 ms │                      41.54 ms │    no change │
│ QQuery 4  │  30.93 ms │                      32.57 ms │ 1.05x slower │
│ QQuery 5  │  92.27 ms │                      92.95 ms │    no change │
│ QQuery 6  │  20.92 ms │                      20.86 ms │    no change │
│ QQuery 7  │ 161.52 ms │                     162.13 ms │    no change │
│ QQuery 8  │  42.01 ms │                      43.50 ms │    no change │
│ QQuery 9  │ 115.11 ms │                     110.37 ms │    no change │
│ QQuery 10 │  67.17 ms │                      71.33 ms │ 1.06x slower │
│ QQuery 11 │  19.91 ms │                      19.70 ms │    no change │
│ QQuery 12 │  52.96 ms │                      53.95 ms │    no change │
│ QQuery 13 │  49.58 ms │                      52.42 ms │ 1.06x slower │
│ QQuery 14 │  15.45 ms │                      15.77 ms │    no change │
│ QQuery 15 │  30.47 ms │                      32.21 ms │ 1.06x slower │
│ QQuery 16 │  29.10 ms │                      29.30 ms │    no change │
│ QQuery 17 │ 149.50 ms │                     149.86 ms │    no change │
│ QQuery 18 │ 287.92 ms │                     292.70 ms │    no change │
│ QQuery 19 │  39.68 ms │                      39.86 ms │    no change │
│ QQuery 20 │  58.49 ms │                      58.17 ms │    no change │
│ QQuery 21 │ 191.30 ms │                     192.44 ms │    no change │
│ QQuery 22 │  23.09 ms │                      23.06 ms │    no change │
└───────────┴───────────┴───────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1654.61ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1670.85ms │
│ Average Time (HEAD)                          │   75.21ms │
│ Average Time (optimize-arrow-bytes-view-map) │   75.95ms │
│ Queries Faster                               │         0 │
│ Queries Slower                               │         4 │
│ Queries with No Change                       │        18 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

Tushar7012 added 13 commits January 24, 2026 21:31

Merge branch 'main' into optimize-arrow-bytes-view-map

cff1f46

reduce Entry memory by using u32 for builder index

e3cd6d4

Following review feedback, changed from usize to u32 for the builder index field. This saves 4 bytes per entry on 64-bit systems while still supporting up to 4 billion distinct values - more than enough for practical workloads.

perf: store views directly, remove builder_idx from Entry

b9f7d68

Merge branch 'main' into optimize-arrow-bytes-view-map

32bd8b1

Apply review suggestions: remove unnecessary & 0xFFFFFFFF masks

49de0e9

Merge branch 'main' into optimize-arrow-bytes-view-map

d2929e1

fix: apply cargo fmt formatting

5996423

fix: use std::mem::take per clippy suggestion

9c06c58

revert: remove accidental table.rs changes (belongs in separate PR)

9302c3b

Merge branch 'main' into optimize-arrow-bytes-view-map

c9be2af

Copilot AI review requested due to automatic review settings January 28, 2026 22:30

github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 28, 2026

Copilot started reviewing on behalf of Tushar7012 January 28, 2026 22:30 View session

merge: resolve conflicts in binary_view_map.rs by prioritizing single…

6a73668

…-fetch optimization

Copilot AI reviewed Jan 28, 2026

View reviewed changes

Tushar7012 added 2 commits January 29, 2026 13:19

perf: defer byte slice fetch in ArrowBytesViewMap to avoid unnecessar…

102855a

…y work for inline strings

chore: apply cargo fmt to binary_view_map.rs

f105b53

Dandandan reviewed Jan 29, 2026

View reviewed changes

Optimize arrow bytes view map #20055

Are you sure you want to change the base?

Optimize arrow bytes view map #20055

Conversation

Tushar7012 commented Jan 28, 2026

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Dandandan commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

Dandandan commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

Dandandan commented Jan 29, 2026

Uh oh!

Dandandan Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Tushar7012 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Tushar7012 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

alamb-ghbot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants