Skip to content

Additional speedup of Tokenizer#20

Closed
nene wants to merge 2 commits into
tenderlove:masterfrom
nene:master
Closed

Additional speedup of Tokenizer#20
nene wants to merge 2 commits into
tenderlove:masterfrom
nene:master

Conversation

@nene
Copy link
Copy Markdown
Contributor

@nene nene commented Jun 5, 2013

I did some further tuning of Tokenizer. Made it about 2 x faster, but had to change the code around considerably more - mostly reordering things. See the commit message for details.

Also eliminated few drops of trailing whitespace.

nene added 2 commits June 5, 2013 19:47
Previously the tokenizer looped over all lexemes, picking the longest
one that matched.  But that's too much effort.  We can order the
lexemes so that we'll just need to look for the first one that matches.

Additionally I measured a bunch of JavaScript files to see which
lexemes appear most often.  The order was like this:

- :SINGLE_CHAR
- :S
- :RAW_IDENT
- :STRING
- :COMMENT
- :LITERALS
- :NUMBER
- :REGEXP

As the :SINGLE_CHAR matched any character I split it up to two lexemes:
one at the beginning that matches single chars that are never part of a
longer lexeme, and one at the end that matches the remaining.

Also I needed to tweak the :S lexeme to only match non-empty whitespace.

This all together gave about 2 x speedup.
@nene
Copy link
Copy Markdown
Contributor Author

nene commented Jun 13, 2013

Closing this in favor of #24

@nene nene closed this Jun 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant