Skip to content

Refactor built-in functions from special lexer tokens to named functions#285

Closed
Copilot wants to merge 1 commit into
masterfrom
copilot/refactor-lexer-parser-functions
Closed

Refactor built-in functions from special lexer tokens to named functions#285
Copilot wants to merge 1 commit into
masterfrom
copilot/refactor-lexer-parser-functions

Conversation

Copy link
Copy Markdown

Copilot AI commented May 14, 2026

Built-in functions (substr, length, split, etc.) were special-cased as lexer tokens (F_SUBSTR, F_LENGTH, F_SPLIT, …) with per-function parsing logic and two separate AST node types (CallExpr vs UserCallExpr). This refactors them to be resolved by name in the parser, unifying the AST representation.

Changes

  • New BuiltinFunc enum (internal/ast/builtins.go): 22 built-in values + BuiltinNone for user-defined calls, with bidirectional name↔enum lookup via BuiltinFuncByName()
  • Unified CallExpr: Replaces both CallExpr{Func lexer.Token} and UserCallExpr{Name string} with a single CallExpr{Builtin BuiltinFunc, Name string, Args []Expr, Pos Position}. Builtin == BuiltinNone indicates a user-defined call.
  • Lexer simplified: Removed all 22 F_* token constants, FIRST_FUNC/LAST_FUNC sentinels, and built-in entries from keywordTokens. These names now lex as plain NAME tokens.
  • Parser: primary() dispatches to builtinCall() via BuiltinFuncByName() lookup on NAME tokens, preserving all per-function validation (lvalue constraints, array args, optional parens for length, variadic sprintf, etc.). concat() no longer needs the FIRST_FUNC/LAST_FUNC range check.
  • Resolver/Compiler: Switch on ast.BuiltinFunc enum instead of lexer.Token. UserCallExpr visitor paths merged into the unified CallExpr handler.

Before/After

// Before: two AST types, keyed by lexer token
&ast.CallExpr{Func: lexer.F_SUBSTR, Args: args}
&ast.UserCallExpr{Name: "my_func", Args: args, Pos: pos}

// After: single type, keyed by enum
&ast.CallExpr{Builtin: ast.BuiltinSubstr, Name: "substr", Args: args, Pos: pos}
&ast.CallExpr{Builtin: ast.BuiltinNone, Name: "my_func", Args: args, Pos: pos}

Adding a new built-in function now requires a registry entry in builtins.go + a parse case in builtinCall(), rather than touching lexer, parser, compiler, and resolver token maps.

- Add BuiltinFunc enum and registry in internal/ast/builtins.go
- Unify CallExpr and UserCallExpr into a single CallExpr with Builtin field
- Remove F_* token constants and FIRST_FUNC/LAST_FUNC from lexer
- Remove built-in function names from lexer keyword map (now plain NAME tokens)
- Update parser to look up built-in functions by name from registry
- Update resolver to switch on BuiltinFunc enum instead of lexer.Token
- Update compiler to switch on BuiltinFunc enum instead of lexer.Token
- Update tests to match new behavior

Agent-Logs-Url: https://github.com/benhoyt/goawk/sessions/22e39b39-5167-4e71-934e-f869bc7a9467

Co-authored-by: benhoyt <999033+benhoyt@users.noreply.github.com>
@benhoyt
Copy link
Copy Markdown
Owner

benhoyt commented May 17, 2026

This was an interesting exercise, but if/when I do this I'll want to do it a somewhat different way to maintain backwards compatibility. I'll probably leave the existing builtins as keywords, but add new ones as a new type (or augment UserCallExpr).

@benhoyt benhoyt closed this May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants