The state system and support for the actions and states in the PEG parser notation#131
The state system and support for the actions and states in the PEG parser notation#131andr-dots wants to merge 79 commits into
Conversation
|
Also, I'd like to add that the many things can be moved to the state class in the future. And the state class should be passed to the parser expression parse function instead of the parser itself. |
|
In a recent commit, I added support of the suppress action. This feature is crucial in the PEG parser notation and the lack of it leads to overcomplex solutions in the semantic layer. |
b6b3810 to
b8a5492
Compare
e74d29d to
d37be8e
Compare
…deep copy operation
180f3aa to
e3ee6ab
Compare
…ld be reused in other cases
…entation parsing process
… "first longer" action is used
…ules of the PEG notation
3685c8d to
4dd394c
Compare
…es the code less readable
…es that were failed
4dd394c to
931e0b5
Compare
|
Hi @andr-dots. Thank you for your substantial work in this PR and sorry for took me so long to respond. While the explicit state handling you've implemented offers powerful capabilities for context-sensitive languages, it also introduces significant complexity and performance considerations, as you've noted. Given Arpeggio's design philosophy of remaining a simple PEG parser, I don't believe this rework aligns with the project's direction. Furthermore, I don't feel comfortable having to maintain this rework. I'd suggest forking this work into a separate library where you could maintain full control over its development. This would allow users needing more advanced parsing capabilities to benefit from your work, while Arpeggio continues serving those who need a straightforward PEG parser. |
|
As for the complexity, the algorithms are pretty simple, but the debugging process of the state system might be a challenge for the users (but it's still a fair price for this functionality, and it can be made easier in future). Since Python can't have any defined-like code blocks like those in C, this MR can't be made an optional feature. As for the project itself and speed, the current realization is slow anyway and it could be speed up only by rewriting the whole regexp management system in a way like the Bison+Flex combination does. And I doubt that I could handle a fork in the context of free time, and if nobody else is interested in it. But I still need this solution for my own needs (it's a handy parser to solve some small tasks), so I'll consider writing a similar (but faster) parser from scratch (it'll be easier to maintain). |
The state system allows programmers to write complex grammar parsers for the languages with the unusual features.
The PEG action system allows to handle such features as templates/generics in such programming languages like Python and Golang or to handle the labels in such languages as C (it's an automatic linkage of labels).
The parsing state system allows to handle non-linear code parsing scenarios (like cases where the syntax constructions overlap and depend on the previously found constructions during the parsing process).
Both the state and the action systems allow programmers to solve many known and possible issues that the original parser was unable to solve. Previously, such problems were, most likely, solved in the semantic layer of the program (with a much greater complexity and much less readable code).
The con of the new system is that the parsing process becomes a little slower while working with the state system because it needs to take a state snapshot every time before a match can fail.
Updated on 2025-07-20:
To make the tests smoother, an improvement was made to the PEG grammar notation, i.e. separator
%operator is now available so that writing repetitions with a separator would be much easier. Although I thought about using,instead, but this way the code looks too odd.Code review checklist
CHANGELOG.md, no needto update for typo fixes and such).