Additional speedup of Tokenizer by nene · Pull Request #20 · tenderlove/recma

nene · 2013-06-05T17:02:02Z

I did some further tuning of Tokenizer. Made it about 2 x faster, but had to change the code around considerably more - mostly reordering things. See the commit message for details.

Also eliminated few drops of trailing whitespace.

Previously the tokenizer looped over all lexemes, picking the longest one that matched. But that's too much effort. We can order the lexemes so that we'll just need to look for the first one that matches. Additionally I measured a bunch of JavaScript files to see which lexemes appear most often. The order was like this: - :SINGLE_CHAR - :S - :RAW_IDENT - :STRING - :COMMENT - :LITERALS - :NUMBER - :REGEXP As the :SINGLE_CHAR matched any character I split it up to two lexemes: one at the beginning that matches single chars that are never part of a longer lexeme, and one at the end that matches the remaining. Also I needed to tweak the :S lexeme to only match non-empty whitespace. This all together gave about 2 x speedup.

nene · 2013-06-13T14:22:04Z

Closing this in favor of #24

nene added 2 commits June 5, 2013 19:47

Trim trailing whitespace in tokenizer.rb

f3e4958

nene mentioned this pull request Jun 13, 2013

Another approach to speed up tokenizer #24

Open

nene closed this Jun 13, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional speedup of Tokenizer#20

Additional speedup of Tokenizer#20
nene wants to merge 2 commits into
tenderlove:masterfrom
nene:master

nene commented Jun 5, 2013

Uh oh!

nene commented Jun 13, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nene commented Jun 5, 2013

Uh oh!

nene commented Jun 13, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant