-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Simplify Spark hex function #20044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Simplify Spark hex function #20044
Conversation
|
|
||
| Self { | ||
| signature: Signature::one_of(variants, Volatility::Immutable), | ||
| aliases: vec![], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No aliases, remove this
| fn aliases(&self) -> &[String] { | ||
| &self.aliases | ||
| fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> { | ||
| make_scalar_function(compute_hex, vec![Hint::AcceptsSingular])(&args.args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Hint::AcceptsSingular makes our code simpler in handling scalars; we could use ColumnarValue but for now can make use of make_scalar_function
| i -= 1; | ||
| buffer[i] = HEX_CHARS_UPPER[(n & 0xF) as usize]; | ||
| n >>= 4; | ||
| fn hex_values(array: &ArrayRef) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicating this match arm (previously had same for dictionary)
| } | ||
|
|
||
| /// Hex encoding lookup tables for fast byte-to-hex conversion | ||
| const HEX_CHARS_UPPER: &[u8; 16] = b"0123456789ABCDEF"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lowercase was only meant for sha2; remove the code here, as sha2 wasn't using it from here
| fn hex_encode_bytes<'a, I, T>(iter: I) -> Result<ArrayRef> | ||
| where | ||
| I: Iterator<Item = Option<T>>, | ||
| I: ExactSizeIterator<Item = Option<T>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ExactSizeIterator lets us get the length from the iterator, instead of passing it separately
| use datafusion_expr::ColumnarValue; | ||
|
|
||
| #[test] | ||
| fn test_dictionary_hex_utf8() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving tests to SLTs
| FROM VALUES (1), (2), (NULL), (3) | ||
| ) SELECT hex(column1), arrow_typeof(hex(column1)) FROM values | ||
| ---- | ||
| 1 Utf8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally these would stay dictionary, but theres known issue with passing dictionary types via coercible API
| } | ||
|
|
||
| /// Hex encoding lookup tables for fast byte-to-hex conversion | ||
| const HEX_CHARS_UPPER: &[u8; 16] = b"0123456789ABCDEF"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jefffrey great PR and appreciate commented changes, WDYT about hex bench results? datafusion/spark/benches/hex.rs
I ran the bench on my M4 Mac: DetailsI find the supposed int regressions surprising; the main hex loop was not altered apart from simplifying how len is passed in. Maybe it was already so fast that adding |
Which issue does this PR close?
N/A
Rationale for this change
Simplify code.
What changes are included in this PR?
See comments.
Are these changes tested?
Yes.
Are there any user-facing changes?
Removed some public APIs, but they weren't meant to be public anyway.