Skip to content

Another approach to speed up tokenizer#24

Open
nene wants to merge 5 commits into
tenderlove:masterfrom
nene:jump-table
Open

Another approach to speed up tokenizer#24
nene wants to merge 5 commits into
tenderlove:masterfrom
nene:jump-table

Conversation

@nene
Copy link
Copy Markdown
Contributor

@nene nene commented Jun 13, 2013

My previous PR for this was somewhat hacky. So I took a bit of a different approach. Building a lookup table of all the characters - so at each tokenization step we quickly determine possible tokens that can follow by looking at the next character.

Combining this with some other optimizations I get about 2.5 x speedup of the tokenizer. With this the tokenization now takes about 30% of the total parsing time (instead of about 50%), so now the main bottleneck is the parser itself.

nene added 5 commits June 5, 2013 19:47
Instead of looping through the array of all lexemes on every step,
create a map of characters-->lexemes telling which lexemes can
begin with a certain character.

So that when tokenizing we peek the next character, look up those
few lexemes that can begin with it, and only try to mach these.

My benchmarks show 2 x tokenizing performance inrease with this
optimization.
Beacause we're now using StringScanner all the regexes will only
match at the beginning anyway.  So \A is redundant.
By just moving the :LITERALS lexeme between :REGEXP and :SINGLE_CHAR,
the order of lexemes is now such that we can now blindly return the
first one that matches.

Also moved the :S lexeme alongside other one-line definitions.

Because of the previous char-lookup-table optimization, this one
improves the speed only so little.  But IMHO the code is a bit cleaner
this way.
This mainly speeds up the last tokenization step - converting tokens to
racc tokens.

That's only about 1.2 x speedup of tokenizer though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant