Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions datafusion/physical-expr-common/src/binary_view_map.rs
Original file line number Diff line number Diff line change
Expand Up @@ -291,8 +291,8 @@ where

// Extract length from the view (first 4 bytes of u128 in little-endian)
let len = view_u128 as u32;

// Check if value already exists
let mut value: Option<&[u8]> = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you’re right to flag this.

The intent here was to defer materializing the byte slice until we know it’s needed, but the current structure makes that unclear and introduces unnecessary indirection.

I’ll refactor this so the slice is explicitly materialized once (at the top of the loop) and then reused consistently for both lookup and insertion paths. That should make the logic clearer and align better with the original optimization goal.

let maybe_payload = {
// Borrow completed and in_progress for comparison
let completed = &self.completed;
Expand Down Expand Up @@ -328,8 +328,9 @@ where
} else {
&in_progress[offset..offset + stored_len]
Comment on lines 328 to 329
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checking which buffer to read from should be more defensive. If buffer_index > completed.len(), this would incorrectly try to access in_progress with potentially invalid offsets. While this shouldn't happen with the current append logic, consider adding an assertion or explicit error handling to catch any future bugs. For example: else if buffer_index == completed.len() followed by else { panic!("Invalid buffer_index") }.

Suggested change
} else {
&in_progress[offset..offset + stored_len]
} else if buffer_index == completed.len() {
&in_progress[offset..offset + stored_len]
} else {
panic!("Invalid buffer_index: {}", buffer_index);

Copilot uses AI. Check for mistakes.
};
let input_value: &[u8] = values.value(i).as_ref();
stored_value == input_value
let input_value =
value.get_or_insert_with(|| values.value(i).as_ref());
stored_value == *input_value
})
.map(|entry| entry.payload)
};
Expand All @@ -338,7 +339,7 @@ where
payload
} else {
// no existing value, make a new one
let value: &[u8] = values.value(i).as_ref();
let value = value.unwrap_or_else(|| values.value(i).as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still accesses it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you’re absolutely right — this still ends up calling values.value(i) in the insert path.

That contradicts the goal of fetching the value once per row. I’ll rework this to eagerly bind the slice to a local variable and thread that through both the lookup and insert branches so values.value(i) is only accessed a single time.

let payload = make_payload_fn(Some(value));

// Create view pointing to our buffers
Expand Down