The state system and support for the actions and states in the PEG parser notation by andr-dots · Pull Request #131 · textX/Arpeggio

andr-dots · 2025-07-11T13:08:35Z

The state system allows programmers to write complex grammar parsers for the languages with the unusual features.

The PEG action system allows to handle such features as templates/generics in such programming languages like Python and Golang or to handle the labels in such languages as C (it's an automatic linkage of labels).

The parsing state system allows to handle non-linear code parsing scenarios (like cases where the syntax constructions overlap and depend on the previously found constructions during the parsing process).

Both the state and the action systems allow programmers to solve many known and possible issues that the original parser was unable to solve. Previously, such problems were, most likely, solved in the semantic layer of the program (with a much greater complexity and much less readable code).

The con of the new system is that the parsing process becomes a little slower while working with the state system because it needs to take a state snapshot every time before a match can fail.

Updated on 2025-07-20:

To make the tests smoother, an improvement was made to the PEG grammar notation, i.e. separator % operator is now available so that writing repetitions with a separator would be much easier. Although I thought about using , instead, but this way the code looks too odd.

Code review checklist

Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
Title summarizes what is changing
Commit messages are meaningful (see [this][commit messages] for details)
Tests have been included and/or updated
Docstrings have been included and/or updated, as appropriate
Standalone docs have been updated accordingly
Changelog(s) has/have been updated, as needed (see CHANGELOG.md, no need
to update for typo fixes and such).

andr-dots · 2025-07-11T13:15:20Z

Also, I'd like to add that the many things can be moved to the state class in the future. And the state class should be passed to the parser expression parse function instead of the parser itself.

andr-dots · 2025-07-15T10:35:47Z

In a recent commit, I added support of the suppress action. This feature is crucial in the PEG parser notation and the lack of it leads to overcomplex solutions in the semantic layer.

…deep copy operation

…te resetting

…terns

…files clearer

… code cleanup

…descendants

…ld be reused in other cases

…entation parsing process

… "first longer" action is used

…ules of the PEG notation

…re introduced

…es the code less readable

…es that were failed

…arser model

…to be empty

…rules

igordejanovic · 2025-10-15T17:26:59Z

Hi @andr-dots. Thank you for your substantial work in this PR and sorry for took me so long to respond. While the explicit state handling you've implemented offers powerful capabilities for context-sensitive languages, it also introduces significant complexity and performance considerations, as you've noted.

Given Arpeggio's design philosophy of remaining a simple PEG parser, I don't believe this rework aligns with the project's direction. Furthermore, I don't feel comfortable having to maintain this rework.

I'd suggest forking this work into a separate library where you could maintain full control over its development. This would allow users needing more advanced parsing capabilities to benefit from your work, while Arpeggio continues serving those who need a straightforward PEG parser.

andr-dots · 2026-06-12T06:25:40Z

As for the complexity, the algorithms are pretty simple, but the debugging process of the state system might be a challenge for the users (but it's still a fair price for this functionality, and it can be made easier in future). Since Python can't have any defined-like code blocks like those in C, this MR can't be made an optional feature. As for the project itself and speed, the current realization is slow anyway and it could be speed up only by rewriting the whole regexp management system in a way like the Bison+Flex combination does. And I doubt that I could handle a fork in the context of free time, and if nobody else is interested in it. But I still need this solution for my own needs (it's a handy parser to solve some small tasks), so I'll consider writing a similar (but faster) parser from scratch (it'll be easier to maintain).

andr-dots force-pushed the peg-actions-and-states branch 2 times, most recently from b6b3810 to b8a5492 Compare July 17, 2025 18:13

andr-dots marked this pull request as draft July 18, 2025 12:43

andr-dots force-pushed the peg-actions-and-states branch 2 times, most recently from e74d29d to d37be8e Compare July 20, 2025 19:56

andr-dots added 23 commits July 21, 2025 19:34

Added support for actions that are being performed over the PEG rules

25b4cf9

Added a test for the previously added parser actions

2c77a24

Added support for remembering tokens and backreferencing any of them

7b81113

Added support for backreferencing in FIFO order

eccd1a0

Added error handling for the pop action

ada6b4c

Added the states system to solve complex parsing problems

413f93f

Added the tests for the debug=True output

3208f2d

Allowed PEG actions to be put into complex expressions

c4e323b

Moved ParserState to the initial Parser class; added support for the …

b1f1d50

…deep copy operation

Added additional checks in the PEG actions tests

0a718b4

Added support for undoing state changes after a match failure via sta…

5c109d9

…te resetting

Fixed the examples to make the tests passing

da94c24

Moved the operation token after regexp to speed up common grammar pat…

b52957c

…terns

Made the MatchState and MatchActions description in the exported dot …

e427ae9

…files clearer

Added more type annotations to the states and actions code; made some…

507e62a

… code cleanup

Moved MatchActions to the peg module

e55d78c

The states initial system was moved to the base Parser class

eb7f4b2

Added support for overriding the ParserPEGActions class in ParserPEG …

5e8b3ca

…descendants

And expressions should not modify the state of the parser

6787a89

Added some comments to the action tests grammar

5a761b1

Added the documentation for the PEG actions and states system

a0dbe2b

Refactored the PEG actions into the separate classes

bdee138

The _parse() method should be declared in the appropriate abstract class

fec790b

andr-dots changed the title ~~The state system and support for the actions and states in PEG parser notation~~ The state system and support for the actions and states in the PEG parser notation Jul 27, 2025

andr-dots marked this pull request as ready for review July 27, 2025 10:21

andr-dots mentioned this pull request Jul 27, 2025

Support for the config modifiers in the PEG notation #135

Draft

7 tasks

andr-dots force-pushed the peg-actions-and-states branch 3 times, most recently from 180f3aa to e3ee6ab Compare August 6, 2025 13:47

andr-dots added 12 commits August 6, 2025 18:59

Renamed the state layer scope class to a more common name so that cou…

a0e1fd7

…ld be reused in other cases

Added the "first longer" and "other same" actions to simplify the ind…

7ae666a

…entation parsing process

Added a test for the wrong indentation at the start of a program when…

c875384

… "first longer" action is used

Fixed a grammar error in the PEG state and action test files

7e222f5

Added support for the delimiter operator in the repetition grouping r…

67a8711

…ules of the PEG notation

Added the documentation for the repetition "%" operator

752fcaf

Added the documentation to the previously added PEG actions

3ac685c

Fixed the description of the "add" and "any" PEG actions

a37cb1f

Suppressed a false IDE warning about the wrong types

d5d5f87

Removed the obsolete code that was used before the state snapshots we…

b240676

…re introduced

Fixed the type hintings for the parse/_parse methods of some classes

f578f51

Marked the error raise method that it never returns

0efeaef

andr-dots force-pushed the peg-actions-and-states branch from 3685c8d to 4dd394c Compare August 6, 2025 15:59

andr-dots added 7 commits August 6, 2025 21:23

Fixed the copy&paste error in the PEG actions and state test that mak…

cfb961a

…es the code less readable

Made the error messages from the PEG actions more clear about the rul…

d4aa027

…es that were failed

Removed the unused parameter from the actions run() method

a904cfa

Moved describing functionality into an appropriate interface in the p…

18278c9

…arser model

Fixed an error in the exporting to dot module which caused the graph …

d479db4

…to be empty

Fixed the missing variable error

b9124d6

Added support for rule name resolving to passthrough names to action …

931e0b5

…rules

andr-dots force-pushed the peg-actions-and-states branch from 4dd394c to 931e0b5 Compare August 6, 2025 18:23

Made string representation of the match actions class clearer

7b27d5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The state system and support for the actions and states in the PEG parser notation#131

The state system and support for the actions and states in the PEG parser notation#131
andr-dots wants to merge 79 commits into
textX:masterfrom
andr-dots:peg-actions-and-states

andr-dots commented Jul 11, 2025 •

edited

Loading

Uh oh!

andr-dots commented Jul 11, 2025 •

edited

Loading

Uh oh!

andr-dots commented Jul 15, 2025 •

edited

Loading

Uh oh!

igordejanovic commented Oct 15, 2025

Uh oh!

andr-dots commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andr-dots commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review checklist

Uh oh!

andr-dots commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andr-dots commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igordejanovic commented Oct 15, 2025

Uh oh!

andr-dots commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andr-dots commented Jul 11, 2025 •

edited

Loading

andr-dots commented Jul 11, 2025 •

edited

Loading

andr-dots commented Jul 15, 2025 •

edited

Loading