Faster split_at_semicolon#17
Conversation
| if !found { | ||
| total += 1 | ||
| }; | ||
| found |= *buffer.get_unchecked(pos - 1) == b';'; | ||
| if !found { | ||
| total += 1 | ||
| }; | ||
| pos = pos - total; |
There was a problem hiding this comment.
How does changing pos directly compare? it should save a couple operations
| if !found { | |
| total += 1 | |
| }; | |
| found |= *buffer.get_unchecked(pos - 1) == b';'; | |
| if !found { | |
| total += 1 | |
| }; | |
| pos = pos - total; | |
| if !found { | |
| pos -= 1; | |
| }; | |
| found |= *buffer.get_unchecked(pos) == b';'; | |
| if !found { | |
| pos -= 1 | |
| }; |
another idea with some bool magic to be branchless:
| if !found { | |
| total += 1 | |
| }; | |
| found |= *buffer.get_unchecked(pos - 1) == b';'; | |
| if !found { | |
| total += 1 | |
| }; | |
| pos = pos - total; | |
| pos -= (1 - found as i16) | |
| found |= *buffer.get_unchecked(pos) == b';'; | |
| pos -= (1 - found as i16) |
There was a problem hiding this comment.
I think that changing pos for the next read based on the previous read could cause a pipeline stall, or at least a delay for a few cycles until it knows what value was read (Note: using found |= ... is not short-circuiting, so it will still read from the location. Using found = found || .. would short circuit, but would introduce branching). Perhaps the compiler could mask the delay by inserting other instructions for the CPU to perform while waiting for the result, but perhaps not. Reading just the 4th and 5th bytes from the end and basing all calculations on those bytes would probably be better.
It could be written a bit faster/clearer perhaps:
| if !found { | |
| total += 1 | |
| }; | |
| found |= *buffer.get_unchecked(pos - 1) == b';'; | |
| if !found { | |
| total += 1 | |
| }; | |
| pos = pos - total; | |
| let not_found1 = (*buffer.get_unchecked(pos) != b';') as usize; | |
| let not_found2 = (*buffer.get_unchecked(pos - 1) != b';') as usize; | |
| pos -= (not_found1 + (not_found1 & not_found2)); |
Aside: Parsing the temperature would probably be faster if it were based solely on the last 5 bytes of the line, rather than being split into a variable length slice (but semicolon location is still useful for the string length).
There was a problem hiding this comment.
I think you will be right about the stall of the operations as the cpu will not be able to predict the position.
The semicolon is either found at buffer[buffer.len()-4] or buffer[buffer.len()-5], otherwise is assumed to be at buffer[buffer.len()-6]. About 5% faster overall with no looping.
Buffer always has length >= 5 characters, because station name has a least 1 byte and there's a semicolon, so it's safe to check at buffer.len()-5.