SIMD acceleration for rapidxml byte-scan hot paths#81
Draft
NelsonVides wants to merge 4 commits intoperf/to_binaryfrom
Draft
SIMD acceleration for rapidxml byte-scan hot paths#81NelsonVides wants to merge 4 commits intoperf/to_binaryfrom
NelsonVides wants to merge 4 commits intoperf/to_binaryfrom
Conversation
git-subtree-dir: c_src/simde git-subtree-split: 71fd833d9666141edcd1d3c109a80e228303d8d7
Add SIMDe-based 128-bit (SSE2/NEON) fast paths for rapidxml::xml_document::skip() and the longer delimiter scans (comment, CDATA, PI, declaration end). Each predicate dispatches to a SIMD specialization via int/long overload tag, falling back to the original scalar loop when no specialization exists. - New c_src/simd_skip.hpp with 11 predicate specializations, scalar prefix for short runs (< 16 bytes), and a single-movemask OR-reduction pattern - rapidxml.hpp: forward declarations + skip() dispatch hooks; trim trailing whitespace and comment/CDATA/PI/declaration scans wired to SIMD helpers - exml.cpp: 15-byte zero tail padding so 16-byte SIMD loads stay in bounds; parse_next adjusted to use the logical (pre-padding) buffer end - rebar.config: -I c_src/simde for exml_nif.so (baseline NIF exml_nif_base.so stays scalar for A/B benchmarking) Uses 128-bit lane width via SIMDe so the same code maps 1:1 to both SSE2 (x86_64) and NEON (AArch64/Apple Silicon) without platform-specific ifdefs or compile flags.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is controversial, ugly, vendors in SIMDe, it's a super verbose change, and C++26 is going to standardise most of this mess into hopefully something cleaner (though C++ has a history of making a worse mess of all great ideas, and compilers will be ready by like... 2028 if not later). But still, it brings from 0x (no changes, no regressions) to up to 5x improvements. Improvement grows the longer the string that doesn't need escapes (big bodies, CDATA, etc).
I'm experimenting with SIMD in general, might close this later :)
Add SIMDe-based 128-bit (SSE2/NEON) fast paths for
rapidxml::xml_document::skip() and the longer delimiter scans (comment,
CDATA, PI, declaration end). Each predicate dispatches to a SIMD
specialization via int/long overload tag, falling back to the original
scalar loop when no specialization exists.
prefix for short runs (< 16 bytes), and a single-movemask OR-reduction
pattern
trailing whitespace and comment/CDATA/PI/declaration scans wired to
SIMD helpers
bounds; parse_next adjusted to use the logical (pre-padding) buffer
end
exml_nif_base.so stays scalar for A/B benchmarking)
Uses 128-bit lane width via SIMDe so the same code maps 1:1 to both SSE2
(x86_64) and NEON (AArch64/Apple Silicon) without platform-specific
ifdefs or compile flags.