Skip to content

feat(parser): complete direct ANTLR migration and retire Instaparse#74

Open
munen wants to merge 27 commits into
masterfrom
feature/antlr-direct-migration
Open

feat(parser): complete direct ANTLR migration and retire Instaparse#74
munen wants to merge 27 commits into
masterfrom
feature/antlr-direct-migration

Conversation

@munen
Copy link
Copy Markdown
Contributor

@munen munen commented Mar 24, 2026

Summary

  • complete the migration to direct ANTLR parsing on both CLJ and CLJS, including ANTLR JS runtime integration for node/doo and parity-preserving parser contracts
  • remove Instaparse migration scaffolding and dead code (resources/org.ebnf, parser macros, and Instaparse dependency), while keeping CLI and uberjar behavior intact
  • update CI/docs/checklists and benchmark snapshot to reflect migration-complete verification across lein test, lein doo node once, lein run, and lein uberjar smoke paths
  • document the large-document :S fast path strategy and benchmark workflow in README.org

Benchmark Comparison

  • Baseline snapshot command: lein run -m org-parser.benchmark 2 4 (Instaparse era)
  • Current snapshot command: lein run -m org-parser.benchmark 5 20 (current ANTLR branch)
  • Note: because run configs differ, this comparison is directional.
Case Parse-only p50 baseline Parse-only p50 current Parse+transform p50 baseline Parse+transform p50 current
fixture-minimal 0.5ms 0.337ms 0.5ms 0.352ms
fixture-bold-text 4.1ms 2.311ms 4.4ms 2.330ms
fixture-headlines-and-tables 8.8ms 5.088ms 7.4ms 3.064ms
fixture-schedule-with-repeater 1.4ms 0.471ms 1.4ms 0.507ms
edge-headline-umlaut 0.3ms 0.164ms 0.3ms 0.186ms
large-readme-derived (~31k lines) 13300.5ms 29.611ms 13680.4ms 145.212ms

@munen munen force-pushed the feature/antlr-direct-migration branch 2 times, most recently from 77bc50f to 1825820 Compare March 24, 2026 10:05
Migrate CLJ/CLJS parsing to direct ANTLR, remove Instaparse/EBNF runtime remnants, and keep CI plus test workflows green across lein, doo, CLI, and uberjar paths.
@munen munen force-pushed the feature/antlr-direct-migration branch from 1825820 to 63b3087 Compare March 24, 2026 10:07
munen added 5 commits March 24, 2026 11:24
Move shared parser API logic to parser.cljc and remove duplicated CLJ/CLJS wrapper files while keeping runtime-specific ANTLR interop split.
Rename antlr_parser_test to parser_start_rules_test and update namespace/test ids to describe parser behavior rather than backend implementation details.
Bring back richer table assertions in parser_test for the headlines_and_tables fixture using focused structural checks for org-table cells, formulas, and table.el lines without brittle full-tree equality.
Replace legacy EBNF parser wording with ANTLR-focused language and clarify that grammar remains EBNF-like while ANTLR provides stronger tooling and cross-runtime generation.
Document parser architecture and workflow, tighten parser API contracts, expand CLJ/CLJS start-rule parity tests, and extract shared AST post-processing to reduce runtime drift risk.
@munen munen force-pushed the feature/antlr-direct-migration branch from 0574f80 to 75f8830 Compare March 25, 2026 08:42
munen added 11 commits March 25, 2026 11:04
Shift more line and timestamp parsing into the shared lexer/parser grammars so CLJ and CLJS rely on the same cross-platform parser behavior with less duplicated runtime logic.
Move tags, diary sexp, affiliated keywords, list items, tables, and text-styled start rules from custom regex parsing into shared grammar-backed parsing for CLJ and CLJS.
Replace manual link-format parsing with grammar-backed parsing and shared AST mapping while preserving existing escaped-bracket and link-target semantics across CLJ and CLJS.
Route link-format plus eol/word direct starts through grammar-backed parsing and keep AST behavior aligned across JVM and JS runtimes.
Shift text-sup and radio-target parsing into grammar-backed starts and wire text scanning through those ANTLR rules while preserving existing AST outputs across CLJ and CLJS.
Move noparse block parsing from custom regex logic into grammar-backed parsing and remove obsolete runtime parsing helpers while keeping CLJ/CLJS behavior aligned.
Use grammar-driven text and dynamic block parsing so the JVM and JS parsers stay aligned while removing the remaining custom text scanner.
Keep unmatched inline delimiters as plain text and reject URL-like file-link inputs so the ANTLR parser stays aligned across CLJ and CLJS. Remove the stray planning artifact and lock the behavior in regression tests.
Move the common parser AST-building and validation code into a shared cljc namespace so JVM and Node stay aligned while reducing duplicated maintenance work.
@munen munen force-pushed the feature/antlr-direct-migration branch from 2aff51d to fc238e0 Compare March 25, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant