Skip to content

Fix string/comment parsing and member_expression associativity (#52 and #53)#67

Open
mallyskies wants to merge 6 commits into
vantreeseba:mainfrom
masquepublishing:fix/grammar-issues-52-53
Open

Fix string/comment parsing and member_expression associativity (#52 and #53)#67
mallyskies wants to merge 6 commits into
vantreeseba:mainfrom
masquepublishing:fix/grammar-issues-52-53

Conversation

@mallyskies

Copy link
Copy Markdown

Fixes two parsing bugs affecting Haxe 4.x code:

Issue #53 — Strings swallowing inline comments

The string rule consumed // as part of the string content, causing any string
literal followed by a line comment to produce a malformed AST for the remainder
of the expression. Fixed by anchoring the string rule to stop before //.

Issue #52 — member_expression was right-associative

a.b.c was parsed as a.(b.c) instead of (a.b).c. The prior rule used
repeat1(field('member', $._lhs_expression)), which greedily consumed nested
member expressions into the RHS. Fixed by switching to a left-recursive binary
form with prec.left:

member_expression: ($) =>
  prec.left(1,
    seq(
      field('object', choice('this', $.identifier, $.member_expression, $._literal)),
      choice(token('.'), seq(alias('?', $.operator), '.')),
      field('member', $.identifier),
    ),
  ),

Also added [$.member_expression, $._lhs_expression] to the conflicts array
to resolve the GLR conflict that arises in optional-chaining contexts.

Bonus fixes included in the first commit:

  • enum abstract declarations now parse correctly
  • struct typedef bodies parse correctly

Verified against ~7,000 Haxe source files with zero parse errors.

1. Double-quoted string atomicity (grammar-literals.js): make double-
   quoted strings atomic so :// inside URL literals (e.g. "http://...")
   is not consumed as a // line comment by the tree-sitter extras rule.

2. enum abstract declaration (grammar-declarations.js, grammar.js): add
   enum_abstract_declaration rule for `enum abstract Foo(Int) { ... }`
   syntax including optional from/to type conversions. Remove bare 'enum'
   from the keyword rule since it is now a proper declaration keyword.

3. Struct typedef (grammar-declarations.js, grammar.js): add
   $.structure_type to typedef_declaration rhs choices so
   `typedef X = { field: T }` parses correctly. Add conflict pairs for
   [enum_abstract_declaration, enum_declaration] and
   [typedef_declaration, structure_type].

Tested against 5,490 .hx files (Masque Publishing codebase); 0 parse
errors after these fixes (2 files intentionally empty/commented-out).
a.b.c now parses as (a.b).c instead of a.(b.c).
Switch from prec.right/repeat1 to prec.left binary form,
allow member_expression in object position for left-recursion,
add conflict pair [member_expression, _lhs_expression].
_pascalCaseIdentifier and _camelCaseIdentifier only allowed underscores
in the leading run of characters, not after a case shift. This silently
truncated any type/constructor reference shaped like CAPS_Suffix (e.g.
POTI_Reel) at the first underscore wherever it appeared as a
super_class_name, interface_name, or constructor target, since those
fields resolve through type_name -> _pascalCaseIdentifier. Also widen
_pascalCaseIdentifier's leading class to accept '_', since Haxe type
names may start with an underscore (module-private types) per the
language spec.
Ran `tree-sitter generate --no-bindings` with tree-sitter-cli 0.23.2,
pinned to match the py-tree-sitter 0.23.2 runtime (ABI 14) used by
graphify's Haxe extractor -- the latest CLI (0.26.x) emits ABI 15,
which that runtime rejects. src/tree_sitter/parser.h picks up one
added TSLexer field (log) required by that runtime version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant