Skip to content

Fix pascal/camel case identifier tokens to allow underscores#68

Open
mallyskies wants to merge 1 commit into
vantreeseba:mainfrom
masquepublishing:fix/pascal-camel-case-identifier-underscores
Open

Fix pascal/camel case identifier tokens to allow underscores#68
mallyskies wants to merge 1 commit into
vantreeseba:mainfrom
masquepublishing:fix/pascal-camel-case-identifier-underscores

Conversation

@mallyskies

Copy link
Copy Markdown

Fixes two related lexing bugs in _pascalCaseIdentifier/_camelCaseIdentifier that cause any identifier shaped like Foo_Bar (or a leading-underscore type name) to be silently truncated or rejected wherever it's used as a type_nameextends/implements targets, new X(...) constructors, and import paths.

Truncated at the first underscore after a case shift

_pascalCaseIdentifier original trailing character class excludes _, so once the leading run of uppercase letters ends, an underscore stops the match. class Foo extends Bar_Baz {} parses Bar_Baz as just Bar, leaving _Baz as a separate ERROR node. Same issue in _camelCaseIdentifier for lowerCamel names with an underscore after the first case shift (e.g. fooBar_Baz).

Confirmed via direct parse — before this fix: class Foo extends Bar_Baz {}

produces type_name: 'Bar' followed by an ERROR node for _Baz.

Type names starting with _ were rejected outright

Per the Haxe manual: "Type names must start with an upper-case letter A-Z or an underscore _." _pascalCaseIdentifier's leading character class was [A-Z]+, which can never match a leading _, so a module-private type name (Haxe's _Foo convention) fails to lex as a type_name reference at all — not truncated, just unrecognized.

Fix

-    _camelCaseIdentifier: ($) => /[a-z_]+[a-zA-Z0-9]*/,
-    _pascalCaseIdentifier: ($) => /[A-Z]+[a-zA-Z0-9]*/,
+    _camelCaseIdentifier: ($) => /[a-z_]+[a-zA-Z0-9_]*/,
+    _pascalCaseIdentifier: ($) => /[A-Z_]+[a-zA-Z0-9_]*/,

Verified locally by regenerating the parser and re-parsing extends/implements/new X(...)/leading-underscore cases — all resolve with no ERROR nodes. A downstream tree-sitter consumer (a Haxe-aware code-graph extractor) now correctly resolves inherits/instantiation edges for identifiers using this naming convention, where previously they silently collapsed to whatever text preceded the first underscore. Not including the regenerated src/ output in this PR itself, since the project's convention (per past merged PRs) is grammar-source-only and letting the maintainer regenerate.

_pascalCaseIdentifier and _camelCaseIdentifier only allowed underscores
in the leading run of characters, not after a case shift. This silently
truncated any type/constructor reference shaped like CAPS_Suffix
(e.g. POTI_Reel) at the first underscore wherever it appeared as a
super_class_name, interface_name, or constructor target, since those
fields resolve through type_name -> _pascalCaseIdentifier. Also widen
_pascalCaseIdentifier's leading class to accept '_', since Haxe type
names may start with an underscore (module-private types) per the
language spec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant