Another approach to speed up tokenizer by nene · Pull Request #24 · tenderlove/recma

nene · 2013-06-13T14:21:12Z

My previous PR for this was somewhat hacky. So I took a bit of a different approach. Building a lookup table of all the characters - so at each tokenization step we quickly determine possible tokens that can follow by looking at the next character.

Combining this with some other optimizations I get about 2.5 x speedup of the tokenizer. With this the tokenization now takes about 30% of the total parsing time (instead of about 50%), so now the main bottleneck is the parser itself.

Instead of looping through the array of all lexemes on every step, create a map of characters-->lexemes telling which lexemes can begin with a certain character. So that when tokenizing we peek the next character, look up those few lexemes that can begin with it, and only try to mach these. My benchmarks show 2 x tokenizing performance inrease with this optimization.

Beacause we're now using StringScanner all the regexes will only match at the beginning anyway. So \A is redundant.

By just moving the :LITERALS lexeme between :REGEXP and :SINGLE_CHAR, the order of lexemes is now such that we can now blindly return the first one that matches. Also moved the :S lexeme alongside other one-line definitions. Because of the previous char-lookup-table optimization, this one improves the speed only so little. But IMHO the code is a bit cleaner this way.

This mainly speeds up the last tokenization step - converting tokens to racc tokens. That's only about 1.2 x speedup of tokenizer though.

nene added 5 commits June 5, 2013 19:47

Trim trailing whitespace in tokenizer.rb

f3e4958

Eliminate \A metacharacter from tokenizer regexes.

f071ba9

Beacause we're now using StringScanner all the regexes will only match at the beginning anyway. So \A is redundant.

Replace keyword lookup arrays with hashes.

7abcb28

This mainly speeds up the last tokenization step - converting tokens to racc tokens. That's only about 1.2 x speedup of tokenizer though.

nene mentioned this pull request Jun 13, 2013

Additional speedup of Tokenizer #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another approach to speed up tokenizer#24

Another approach to speed up tokenizer#24
nene wants to merge 5 commits into
tenderlove:masterfrom
nene:jump-table

nene commented Jun 13, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nene commented Jun 13, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant