SIMD acceleration for rapidxml byte-scan hot paths by NelsonVides · Pull Request #81 · esl/exml

NelsonVides · 2026-04-09T19:52:22Z

This is controversial, ugly, vendors in SIMDe, it's a super verbose change, and C++26 is going to standardise most of this mess into hopefully something cleaner (though C++ has a history of making a worse mess of all great ideas, and compilers will be ready by like... 2028 if not later). But still, it brings from 0x (no changes, no regressions) to up to 5x improvements. Improvement grows the longer the string that doesn't need escapes (big bodies, CDATA, etc).

I'm experimenting with SIMD in general, might close this later :)

Add SIMDe-based 128-bit (SSE2/NEON) fast paths for
rapidxml::xml_document::skip() and the longer delimiter scans (comment,
CDATA, PI, declaration end). Each predicate dispatches to a SIMD
specialization via int/long overload tag, falling back to the original
scalar loop when no specialization exists.

New c_src/simd_skip.hpp with 11 predicate specializations, scalar
prefix for short runs (< 16 bytes), and a single-movemask OR-reduction
pattern
rapidxml.hpp: forward declarations + skip() dispatch hooks; trim
trailing whitespace and comment/CDATA/PI/declaration scans wired to
SIMD helpers
exml.cpp: 15-byte zero tail padding so 16-byte SIMD loads stay in
bounds; parse_next adjusted to use the logical (pre-padding) buffer
end
rebar.config: -I c_src/simde for exml_nif.so (baseline NIF
exml_nif_base.so stays scalar for A/B benchmarking)

Uses 128-bit lane width via SIMDe so the same code maps 1:1 to both SSE2
(x86_64) and NEON (AArch64/Apple Silicon) without platform-specific
ifdefs or compile flags.

git-subtree-dir: c_src/simde git-subtree-split: 71fd833d9666141edcd1d3c109a80e228303d8d7

Add SIMDe-based 128-bit (SSE2/NEON) fast paths for rapidxml::xml_document::skip() and the longer delimiter scans (comment, CDATA, PI, declaration end). Each predicate dispatches to a SIMD specialization via int/long overload tag, falling back to the original scalar loop when no specialization exists. - New c_src/simd_skip.hpp with 11 predicate specializations, scalar prefix for short runs (< 16 bytes), and a single-movemask OR-reduction pattern - rapidxml.hpp: forward declarations + skip() dispatch hooks; trim trailing whitespace and comment/CDATA/PI/declaration scans wired to SIMD helpers - exml.cpp: 15-byte zero tail padding so 16-byte SIMD loads stay in bounds; parse_next adjusted to use the logical (pre-padding) buffer end - rebar.config: -I c_src/simde for exml_nif.so (baseline NIF exml_nif_base.so stays scalar for A/B benchmarking) Uses 128-bit lane width via SIMDe so the same code maps 1:1 to both SSE2 (x86_64) and NEON (AArch64/Apple Silicon) without platform-specific ifdefs or compile flags.

NelsonVides added 4 commits April 9, 2026 21:44

benchmark

dcfb2b5

Squashed 'c_src/simde/' content from commit 71fd833d

36efeed

git-subtree-dir: c_src/simde git-subtree-split: 71fd833d9666141edcd1d3c109a80e228303d8d7

Merge commit '36efeed5d183e204d6f65399a4fc6f01a8742cb0' as 'c_src/simde'

dab142b

NelsonVides self-assigned this Apr 9, 2026

NelsonVides added the WIP Don't review WIP-s! label Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD acceleration for rapidxml byte-scan hot paths#81

SIMD acceleration for rapidxml byte-scan hot paths#81
NelsonVides wants to merge 4 commits intoperf/to_binaryfrom
perf/simd_parse

NelsonVides commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NelsonVides commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant