-
Notifications
You must be signed in to change notification settings - Fork 23
pest to chumsky migration #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
pest to chumsky migration #185
Conversation
|
cc @canndrew may want to keep an eye on progress here |
6db55db to
1b1e751
Compare
|
Right now there is a working parser using the Error reporting is currently broken because we need to replace the logic of The code will be refactored because some parts are only half-finished (such as adding |
|
cc @canndrew |
| } | ||
|
|
||
| #[test] | ||
| #[ignore] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1b1e751 It's nice to see that chumsky seems to be faster than pest here.
1b1e751 to
1e7c61b
Compare
src/error.rs
Outdated
| }) | ||
| .map_or(0, |ts| u32::from(ts) as usize); | ||
|
|
||
| let start_col = file[line_start_byte..self.span.start].chars().count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to count columns as being the number of utf8 codepoints? There's no good way to define "number of columns" in general for non-ascii text, but LSP defines it as the number of utf16 codepoints and that's the closest thing to a standard that I'm aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just checked and LSP now allows you to choose between utf{8,16,32} at your leisure. But it's moot anyway since this is just deciding how long an underline to print and that's going to depend on the terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider switching to ariadne for error pretty-printing, as it's the "sister-crate" for chumsky.
|
It's weird that the lexer is treating all our built-in macro/function/etc names as being keywords. I realize that's how the compiler currently works, so it's okay to land this PR as-is to keep the changes small. But obviously we'd want to eventually treat these as just being identifiers. |
bd5c30f to
24a6bc6
Compare
|
I would like to provide more context on a few points:
|
|
Also a note about performance: It would be nice if we could move the parser to a different crate, so it would not affect compile time too much, and the SimplicityHL parser could be used separately from the compiler. |
| let start_pos = index.line_col(TextSize::from(self.span.start as u32)); | ||
| let end_pos = index.line_col(TextSize::from(self.span.end as u32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have From trait implemented to avoid repeating the TextSize::from(self.span.start as u32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Span has a start and an end, so the only way to implement the From trait would be for a tuple of TextSizes. That isn't a very pretty solution either, so I'd prefer to stick with the current variant.
Strongly agreed. If we had a public somewhat-standard AST type that the parser would produce, this would also let people implement some kinds of linters and/or formatters without needing support from us. (We will likely get some pressure to preserve whitespace and comments to help with this. Maybe we actually want two AST types, one that has whitespace and comments and one that's reduced somehow.) In any case, this is all separate from this PR. |
24a6bc6 to
b200640
Compare
|
Please rebase onto master |
b88ef62 to
4bf6252
Compare
src/lexer.rs
Outdated
| "true" => Token::Bool(true), | ||
| "false" => Token::Bool(false), | ||
| "bool" => Token::BooleanType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded built-in names - "true", "false", "bool" are treated as keywords at the lexer level
Let's create a GitHub issue, so we do not forget to resolve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a problem for "true" or "false," because the Rust compiler doesn't allow variables to be named "true" or "false".
However, I can change it for "bool" very quickly, as it wouldn't be so difficult.
| let macros = | ||
| choice((just("assert!"), just("panic!"), just("dbg!"), just("list!"))).map(Token::Macro); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Macros are lexed as special Macro tokens, therefore adding new macros would require lexer changes, let's mention it in the issue as well
Maybe we could consider treating ident! as a general pattern
| // We would discard them for the compiler, but they are needed, for example, for the formatter. | ||
| Comment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create an issue "Formatter support" and mention that content is discarded for now?
| /// ## Errors | ||
| /// | ||
| /// The string is not a valid SimplicityHL program. | ||
| pub fn new<Str: Into<Arc<str>>>(s: Str) -> Result<Self, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we cannot return Error type instead of String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because at this stage we only want to have pretty-printed diagnostics?
| impl TemplateProgram { | ||
| /// Parse the template of a SimplicityHL program. | ||
| /// | ||
| /// ## Errors | ||
| /// | ||
| /// The string is not a valid SimplicityHL program. | ||
| pub fn new<Str: Into<Arc<str>>>(s: Str) -> Result<Self, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TemplateProgram::new() collects errors but only returns them as single string, so are multiple errors included into single string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct.
| @@ -664,4 +675,167 @@ | |||
| .with_witness_values(WitnessValues::default()) | |||
| .assert_run_success(); | |||
| } | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could also add test that checks that multiple errors are detected?
| let compiled = | ||
| match CompiledProgram::new(prog_text, Arguments::default(), include_debug_symbols) { | ||
| Ok(program) => program, | ||
| Err(e) => { | ||
| eprintln!("{}", e); | ||
| std::process::exit(1); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use approach like below?
use std::process::{ExitCode, Termination};
struct CliError(String);
impl Termination for CliError {
fn report(self) -> ExitCode {
eprintln!("{}", self.0);
ExitCode::from(1)
}
}
fn main() -> Result<(), CliError> {
// ...
let compiled = CompiledProgram::new(prog_text, Arguments::default(), include_debug_symbols)
.map_err(CliError)?;
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this code because I was having some trouble with the error display -- it was showing up like a Debug print, and CliError doesn't solve that issue. I'd also like to avoid touching the CLI too much right now, as this PR is mainly focused on the parser changes.
src/parse.rs
Outdated
| tok.into_iter() | ||
| .map(|(tok, span)| (tok, Span::from(span))) | ||
| .filter(|(tok, _)| !matches!(tok, Token::Comment | Token::BlockComment)) | ||
| .collect::<Vec<_>>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplication with ParseFromStrWithErrors
| span: Span::DUMMY, | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No unit tests is quite troublesome, though this file is already huge
4bf6252 to
6a5bc25
Compare
6a5bc25 to
7e5a257
Compare
This commit introduce multiple changes, because it full rewrite of parsing and error Changes in `error.rs`: - Change `Span` to use byte offsets in place of old `Position` - Add `line-index` crate to calculate line and column of byte offset - Change `RichError` implementation to use new `Span` structure - Implement `chumsky` error traits, so it can be used in error reporting of parsers - add `expected..found` error Changes in `parse.rs`: - Fully rewrite `pest` parsers to `chumsky` parsers. - Change `ParseFromStr` trait to use this change.
This adds `ParseFromStrWithErrors`, which would take `ErrorCollector` and return an `Option` of AST. Also changes `TemplateProgram` to use new trait with collector
it's not slow anymore
This adds tests to ensure that the compiler using the `chumsky` parser produces the same Simplicity program as when using the `pest` parser for the default examples. The programs were compiled using an old `simc` version with debug symbols into .json files, and located in `test-data/` folder.
7e5a257 to
e53f005
Compare
|
cc @apoelstra |
No description provided.