Fix string/comment parsing and member_expression associativity (#52 and #53)#67
Open
mallyskies wants to merge 6 commits into
Open
Fix string/comment parsing and member_expression associativity (#52 and #53)#67mallyskies wants to merge 6 commits into
mallyskies wants to merge 6 commits into
Conversation
1. Double-quoted string atomicity (grammar-literals.js): make double-
quoted strings atomic so :// inside URL literals (e.g. "http://...")
is not consumed as a // line comment by the tree-sitter extras rule.
2. enum abstract declaration (grammar-declarations.js, grammar.js): add
enum_abstract_declaration rule for `enum abstract Foo(Int) { ... }`
syntax including optional from/to type conversions. Remove bare 'enum'
from the keyword rule since it is now a proper declaration keyword.
3. Struct typedef (grammar-declarations.js, grammar.js): add
$.structure_type to typedef_declaration rhs choices so
`typedef X = { field: T }` parses correctly. Add conflict pairs for
[enum_abstract_declaration, enum_declaration] and
[typedef_declaration, structure_type].
Tested against 5,490 .hx files (Masque Publishing codebase); 0 parse
errors after these fixes (2 files intentionally empty/commented-out).
a.b.c now parses as (a.b).c instead of a.(b.c). Switch from prec.right/repeat1 to prec.left binary form, allow member_expression in object position for left-recursion, add conflict pair [member_expression, _lhs_expression].
_pascalCaseIdentifier and _camelCaseIdentifier only allowed underscores in the leading run of characters, not after a case shift. This silently truncated any type/constructor reference shaped like CAPS_Suffix (e.g. POTI_Reel) at the first underscore wherever it appeared as a super_class_name, interface_name, or constructor target, since those fields resolve through type_name -> _pascalCaseIdentifier. Also widen _pascalCaseIdentifier's leading class to accept '_', since Haxe type names may start with an underscore (module-private types) per the language spec.
Ran `tree-sitter generate --no-bindings` with tree-sitter-cli 0.23.2, pinned to match the py-tree-sitter 0.23.2 runtime (ABI 14) used by graphify's Haxe extractor -- the latest CLI (0.26.x) emits ABI 15, which that runtime rejects. src/tree_sitter/parser.h picks up one added TSLexer field (log) required by that runtime version.
This reverts commit 884044a.
This reverts commit f27cdb3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes two parsing bugs affecting Haxe 4.x code:
Issue #53 — Strings swallowing inline comments
The string rule consumed
//as part of the string content, causing any stringliteral followed by a line comment to produce a malformed AST for the remainder
of the expression. Fixed by anchoring the string rule to stop before
//.Issue #52 — member_expression was right-associative
a.b.cwas parsed asa.(b.c)instead of(a.b).c. The prior rule usedrepeat1(field('member', $._lhs_expression)), which greedily consumed nestedmember expressions into the RHS. Fixed by switching to a left-recursive binary
form with
prec.left:Also added
[$.member_expression, $._lhs_expression]to theconflictsarrayto resolve the GLR conflict that arises in optional-chaining contexts.
Bonus fixes included in the first commit:
enum abstractdeclarations now parse correctlytypedefbodies parse correctlyVerified against ~7,000 Haxe source files with zero parse errors.