Skip to content

Faster split_at_semicolon#17

Open
b-i-z wants to merge 3 commits intojonhoo:mainfrom
b-i-z:faster_split_at_semicolon
Open

Faster split_at_semicolon#17
b-i-z wants to merge 3 commits intojonhoo:mainfrom
b-i-z:faster_split_at_semicolon

Conversation

@b-i-z
Copy link
Copy Markdown

@b-i-z b-i-z commented Dec 8, 2025

The semicolon is either found at buffer[buffer.len()-4] or buffer[buffer.len()-5], otherwise is assumed to be at buffer[buffer.len()-6]. About 5% faster overall with no looping.

Buffer always has length >= 5 characters, because station name has a least 1 byte and there's a semicolon, so it's safe to check at buffer.len()-5.

Comment thread src/main.rs Outdated
Comment on lines +296 to +303
if !found {
total += 1
};
found |= *buffer.get_unchecked(pos - 1) == b';';
if !found {
total += 1
};
pos = pos - total;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does changing pos directly compare? it should save a couple operations

Suggested change
if !found {
total += 1
};
found |= *buffer.get_unchecked(pos - 1) == b';';
if !found {
total += 1
};
pos = pos - total;
if !found {
pos -= 1;
};
found |= *buffer.get_unchecked(pos) == b';';
if !found {
pos -= 1
};

another idea with some bool magic to be branchless:

Suggested change
if !found {
total += 1
};
found |= *buffer.get_unchecked(pos - 1) == b';';
if !found {
total += 1
};
pos = pos - total;
pos -= (1 - found as i16)
found |= *buffer.get_unchecked(pos) == b';';
pos -= (1 - found as i16)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that changing pos for the next read based on the previous read could cause a pipeline stall, or at least a delay for a few cycles until it knows what value was read (Note: using found |= ... is not short-circuiting, so it will still read from the location. Using found = found || .. would short circuit, but would introduce branching). Perhaps the compiler could mask the delay by inserting other instructions for the CPU to perform while waiting for the result, but perhaps not. Reading just the 4th and 5th bytes from the end and basing all calculations on those bytes would probably be better.

It could be written a bit faster/clearer perhaps:

Suggested change
if !found {
total += 1
};
found |= *buffer.get_unchecked(pos - 1) == b';';
if !found {
total += 1
};
pos = pos - total;
let not_found1 = (*buffer.get_unchecked(pos) != b';') as usize;
let not_found2 = (*buffer.get_unchecked(pos - 1) != b';') as usize;
pos -= (not_found1 + (not_found1 & not_found2));

Aside: Parsing the temperature would probably be faster if it were based solely on the last 5 bytes of the line, rather than being split into a variable length slice (but semicolon location is still useful for the string length).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will be right about the stall of the operations as the cpu will not be able to predict the position.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants