Skip to content

fix: consume unknown suffixes on integer literals to prevent false string prefix matches#304

Open
Dnreikronos wants to merge 2 commits intotree-sitter:masterfrom
Dnreikronos:fix/integer-literal-suffix-consuming
Open

fix: consume unknown suffixes on integer literals to prevent false string prefix matches#304
Dnreikronos wants to merge 2 commits intotree-sitter:masterfrom
Dnreikronos:fix/integer-literal-suffix-consuming

Conversation

@Dnreikronos
Copy link
Copy Markdown

Integer literals like 123c followed by a string "foo"` `were incorrectly tokenized as 123+c"foo"`` (a C-string literal) because the lexer's longest-match rule preferred the "c" string prefix over `c` as part of the integer. This adds a general identifier pattern to the integer suffix choices so that any trailing alphabetic characters are consumed as part of the integer literal, matching rustc's own tokenization behavior.

Fixes cases like some_macro! { 123c"foo" } where c and b prefixes adjacent to integers were misidentified as string literal prefixes.

Related issue: zed-industries/zed#51437

@jordanhalase
Copy link
Copy Markdown

Always consuming unknown suffixes for number literals is the right call because the Rust tokenizer will group 123ghijkl as a single token in a TokenStream.

@Dnreikronos
Copy link
Copy Markdown
Author

Hi, @amaanq, nice to meet you!
Sorry for being ping you.

Can you help us with your analysis on that PR, please?

@wetneb
Copy link
Copy Markdown

wetneb commented Mar 16, 2026

Thanks for working on this! I have integrated it in the tree-sitter-rust-orchard fork, since external contributions are infrequently merged in this repository.

Since your motivating use case is Zed, you might be interested to know that I have been working on a PR to switch Zed's Rust grammar to that fork, to fix another issue related to the attachment of attributes / doc comments. Some more work is needed to update the queries accordingly, though.

Copy link
Copy Markdown

@smitbarmase smitbarmase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few suggestions.

I'm also curious if we should handle float case, such as 1.0c"foo". It appears that rustc tokenizes it as 1.0c + "foo", but treesitter doesn't.

Comment thread src/parser.c Outdated
@Dnreikronos
Copy link
Copy Markdown
Author

Dnreikronos commented Mar 18, 2026

I've added a few suggestions.

I'm also curious if we should handle float case, such as 1.0c"foo". It appears that rustc tokenizes it as 1.0c + "foo", but treesitter doesn't.

Adjusted following your recommendations! Can you analyze again, please?

@jordanhalase
Copy link
Copy Markdown

For good measure these should be tested:

  • 123c"string" should become 123c "string"
  • abc"string" should become abc "string"
  • 123b"string" should become 123b "string"
  • cab"string" should become cab "string"
  • Otherwise b"string" and c"string" highlight as the same color

@Dnreikronos
Copy link
Copy Markdown
Author

For good measure these should be tested:

  • 123c"string" should become 123c "string"
  • abc"string" should become abc "string"
  • 123b"string" should become 123b "string"
  • cab"string" should become cab "string"
  • Otherwise b"string" and c"string" highlight as the same color

For good measure these should be tested:

  • 123c"string" should become 123c "string"
  • abc"string" should become abc "string"
  • 123b"string" should become 123b "string"
  • cab"string" should become cab "string"
  • Otherwise b"string" and c"string" highlight as the same color

Built zed locally pointing to the tree-sitter with the changes I made and got that:
image

@Dnreikronos Dnreikronos force-pushed the fix/integer-literal-suffix-consuming branch from 87810bf to 3b77429 Compare April 2, 2026 12:17
@Dnreikronos Dnreikronos requested a review from smitbarmase April 2, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants