Skip to content

The state system and support for the actions and states in the PEG parser notation#131

Open
andr-dots wants to merge 79 commits into
textX:masterfrom
andr-dots:peg-actions-and-states
Open

The state system and support for the actions and states in the PEG parser notation#131
andr-dots wants to merge 79 commits into
textX:masterfrom
andr-dots:peg-actions-and-states

Conversation

@andr-dots

@andr-dots andr-dots commented Jul 11, 2025

Copy link
Copy Markdown

The state system allows programmers to write complex grammar parsers for the languages with the unusual features.

The PEG action system allows to handle such features as templates/generics in such programming languages like Python and Golang or to handle the labels in such languages as C (it's an automatic linkage of labels).

The parsing state system allows to handle non-linear code parsing scenarios (like cases where the syntax constructions overlap and depend on the previously found constructions during the parsing process).

Both the state and the action systems allow programmers to solve many known and possible issues that the original parser was unable to solve. Previously, such problems were, most likely, solved in the semantic layer of the program (with a much greater complexity and much less readable code).

The con of the new system is that the parsing process becomes a little slower while working with the state system because it needs to take a state snapshot every time before a match can fail.

Updated on 2025-07-20:

To make the tests smoother, an improvement was made to the PEG grammar notation, i.e. separator % operator is now available so that writing repetitions with a separator would be much easier. Although I thought about using , instead, but this way the code looks too odd.

Code review checklist

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
  • Title summarizes what is changing
  • Commit messages are meaningful (see [this][commit messages] for details)
  • Tests have been included and/or updated
  • Docstrings have been included and/or updated, as appropriate
  • Standalone docs have been updated accordingly
  • Changelog(s) has/have been updated, as needed (see CHANGELOG.md, no need
    to update for typo fixes and such).

@andr-dots

andr-dots commented Jul 11, 2025

Copy link
Copy Markdown
Author

Also, I'd like to add that the many things can be moved to the state class in the future. And the state class should be passed to the parser expression parse function instead of the parser itself.

@andr-dots

andr-dots commented Jul 15, 2025

Copy link
Copy Markdown
Author

In a recent commit, I added support of the suppress action. This feature is crucial in the PEG parser notation and the lack of it leads to overcomplex solutions in the semantic layer.

@andr-dots andr-dots force-pushed the peg-actions-and-states branch 2 times, most recently from b6b3810 to b8a5492 Compare July 17, 2025 18:13
@andr-dots andr-dots marked this pull request as draft July 18, 2025 12:43
@andr-dots andr-dots force-pushed the peg-actions-and-states branch 2 times, most recently from e74d29d to d37be8e Compare July 20, 2025 19:56
andr-dots added 23 commits July 21, 2025 19:34
@andr-dots andr-dots changed the title The state system and support for the actions and states in PEG parser notation The state system and support for the actions and states in the PEG parser notation Jul 27, 2025
@andr-dots andr-dots marked this pull request as ready for review July 27, 2025 10:21
@andr-dots andr-dots force-pushed the peg-actions-and-states branch 3 times, most recently from 180f3aa to e3ee6ab Compare August 6, 2025 13:47
@andr-dots andr-dots force-pushed the peg-actions-and-states branch from 3685c8d to 4dd394c Compare August 6, 2025 15:59
@andr-dots andr-dots force-pushed the peg-actions-and-states branch from 4dd394c to 931e0b5 Compare August 6, 2025 18:23
@igordejanovic

Copy link
Copy Markdown
Member

Hi @andr-dots. Thank you for your substantial work in this PR and sorry for took me so long to respond. While the explicit state handling you've implemented offers powerful capabilities for context-sensitive languages, it also introduces significant complexity and performance considerations, as you've noted.

Given Arpeggio's design philosophy of remaining a simple PEG parser, I don't believe this rework aligns with the project's direction. Furthermore, I don't feel comfortable having to maintain this rework.

I'd suggest forking this work into a separate library where you could maintain full control over its development. This would allow users needing more advanced parsing capabilities to benefit from your work, while Arpeggio continues serving those who need a straightforward PEG parser.

@andr-dots

Copy link
Copy Markdown
Author

As for the complexity, the algorithms are pretty simple, but the debugging process of the state system might be a challenge for the users (but it's still a fair price for this functionality, and it can be made easier in future). Since Python can't have any defined-like code blocks like those in C, this MR can't be made an optional feature. As for the project itself and speed, the current realization is slow anyway and it could be speed up only by rewriting the whole regexp management system in a way like the Bison+Flex combination does. And I doubt that I could handle a fork in the context of free time, and if nobody else is interested in it. But I still need this solution for my own needs (it's a handy parser to solve some small tasks), so I'll consider writing a similar (but faster) parser from scratch (it'll be easier to maintain).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants