fix: consume unknown suffixes on integer literals to prevent false string prefix matches#304
Conversation
|
Always consuming unknown suffixes for number literals is the right call because the Rust tokenizer will group |
|
Hi, @amaanq, nice to meet you! Can you help us with your analysis on that PR, please? |
|
Thanks for working on this! I have integrated it in the Since your motivating use case is Zed, you might be interested to know that I have been working on a PR to switch Zed's Rust grammar to that fork, to fix another issue related to the attachment of attributes / doc comments. Some more work is needed to update the queries accordingly, though. |
smitbarmase
left a comment
There was a problem hiding this comment.
I've added a few suggestions.
I'm also curious if we should handle float case, such as 1.0c"foo". It appears that rustc tokenizes it as 1.0c + "foo", but treesitter doesn't.
Adjusted following your recommendations! Can you analyze again, please? |
|
For good measure these should be tested:
|
87810bf to
3b77429
Compare

Integer literals like
123cfollowed by a string"foo"` `were incorrectly tokenized as123+c"foo"`` (a C-string literal) because the lexer's longest-match rule preferred the"c"string prefix over `c` as part of the integer. This adds a general identifier pattern to the integer suffix choices so that any trailing alphabetic characters are consumed as part of the integer literal, matching rustc's own tokenization behavior.Fixes cases like
some_macro! { 123c"foo" }wherecandbprefixes adjacent to integers were misidentified as string literal prefixes.Related issue: zed-industries/zed#51437