-
Notifications
You must be signed in to change notification settings - Fork 662
Improve LSH documentation #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+211
−11
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # crates | ||
|
|
||
| This directory contains the crates that make up Edit and its supporting tooling. | ||
|
|
||
| * `edit`: Main editor binary and library<br> | ||
| It is split apart into a library to allow for benchmarks. | ||
| * `lsh`: Syntax-highlighting compiler and runtime | ||
| * `lsh-bin`: A small CLI for experimenting with and debugging LSH output | ||
| * `stdext`: Shared utility code used across the workspace | ||
| * `unicode-gen`: Code generation utilities for Unicode LUTs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| # lsh | ||
|
|
||
| `lsh` contains the compiler and runtime for Edit's syntax-highlighting system. | ||
|
|
||
| At a high level: | ||
| * Language definitions live in `definitions/*.lsh` | ||
| * The compiler lowers them into bytecode | ||
| * The runtime executes the bytecode on the input text line by line | ||
|
|
||
| To understand the definition language itself, read [definitions/README.md](definitions/README.md). | ||
|
|
||
| For debugging and optimizing language definitions use `lsh-bin`. | ||
| To see the generated assembly, for example: | ||
| ```sh | ||
| # Show the generated assembly of a file or directory | ||
| cargo run -p lsh-bin -- assembly crates/lsh/definitions/diff.lsh | ||
|
|
||
| # Due to the lack of include statements, you must specify included files manually. | ||
| # Here, git_commit.lsh implicitly relies on diff() from diff.lsh. | ||
| cargo run -p lsh-bin -- assembly crates/lsh/definitions/git_commit.lsh crates/lsh/definitions/diff.lsh | ||
| ``` | ||
|
|
||
| Or to render a file: | ||
| ```sh | ||
| cargo run -p lsh-bin -- render --input assets/highlighting-tests/html.html crates/lsh/definitions | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| # LSH Definitions | ||
|
|
||
| This directory contains syntax highlighting definitions. | ||
| Each `.lsh` file describes how to highlight one or more file types. | ||
| The compiler turns these definitions into bytecode, and the runtime executes that bytecode against the input one line at a time. | ||
|
|
||
| Essentially, LSH is a small, line-oriented coroutine language for writing lexers. | ||
|
|
||
| ## The basic idea | ||
|
|
||
| Most definitions follow the same pattern: | ||
| * Select a definition by file name or path | ||
| * Walk the current line from left to right | ||
| * Try regexes at the current position | ||
| * `yield` highlight kinds as tokens are recognized | ||
| * Use `await input` only when a construct needs to continue onto the next line | ||
|
|
||
| ## A minimal definition | ||
|
|
||
| A definition is a `pub fn` with attributes that tell the editor when to use it: | ||
|
|
||
| ```rs | ||
| #[display_name = "Diff"] | ||
| #[path = "**/*.diff"] | ||
| #[path = "**/*.patch"] | ||
| pub fn diff() { | ||
| if /(?:diff|---|\+\+\+).*/ { | ||
| yield meta.header; | ||
| } else if /-.*/ { | ||
| yield markup.deleted; | ||
| } else if /\+.*/ { | ||
| yield markup.inserted; | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| `#[display_name]` sets the human-readable name. | ||
| `#[path]` is a glob pattern; you can have as many as you need. | ||
| Functions without `pub` are private helpers that can be called from other definitions. | ||
|
|
||
| ## How execution works | ||
|
|
||
| The runtime feeds input to a definition one line at a time. | ||
| Within a line, matching is always left to right. | ||
|
|
||
| Each `if /regex/` tries to match at the current position: | ||
| * On success, the input position advances past the match and the block runs | ||
| * On failure, the input position does not move and the `else` branch, if any, runs | ||
|
|
||
| Definitions behave like coroutines: | ||
| * If execution reaches `await input`, the function suspends and resumes on the next line | ||
| * If the function returns, the next line starts again from the top of the function | ||
|
|
||
| ## Highlighting with `yield` | ||
|
|
||
| `yield <kind>` emits a highlight span. | ||
| Everything between the previous `yield` and the current position is colored with `<kind>`. | ||
|
|
||
| > [!NOTE] | ||
| > This can be confusing in practice, because `yield` does not just color the regex it appears in. | ||
| > Long term, the goal is for `yield` to apply only to the regex it appears in, or to some other explicitly specified range. | ||
|
|
||
| Highlight kinds are dotted identifiers such as `comment`, `string`, `keyword.control`, `constant.numeric`, and `markup.bold`. | ||
| Kinds are interned at compile time. You can invent new ones, but the editor still needs to know what color to assign them. | ||
|
|
||
| `yield other` switches back to the default, unhighlighted kind. | ||
| Use it when you want to reset the current highlight between tokens. See [json.lsh](json.lsh) for a representative pattern. | ||
|
|
||
| ## Multi-line constructs | ||
|
|
||
| Single-line constructs need no special handling. | ||
| For constructs that can span lines, such as block comments or fenced code blocks, combine `loop` or `until` with `await input`: | ||
|
|
||
| ```rs | ||
| if /\/\*/ { | ||
| loop { | ||
| yield comment; | ||
| await input; | ||
| if /\*\// { | ||
| yield comment; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| `await input` means "advance to the next line if there is no more input to consume here." | ||
| If there is still unconsumed text on the current line, it is a no-op and execution continues immediately. | ||
|
|
||
| One important detail: if you want the remainder of the current line to stay highlighted, emit the appropriate `yield` before `await input`. | ||
|
|
||
| ## Control flow | ||
|
|
||
| | Expression | Meaning | | ||
| |------------|---------| | ||
| | `if /pat/ { ... }` | Match `pat` at the current position and enter the block on success | | ||
| | `else if /pat/ { ... }` | Try another pattern if the previous one failed | | ||
| | `else { ... }` | Fallback branch | | ||
| | `loop { ... }` | Loop until `break`, `continue`, or `return` | | ||
| | `until /pat/ { ... }` | Repeat the body until `pat` matches, then consume the match and exit | | ||
| | `break` | Exit the innermost loop | | ||
| | `continue` | Restart the innermost loop | | ||
| | `return` | Exit the current function | | ||
|
|
||
| `until /$/ { ... }` is the usual way to say "keep processing until end-of-line." | ||
|
|
||
| ## Capture groups | ||
|
|
||
| Regexes can have capture groups. | ||
| Use `yield $N as <kind>` when only part of the match should receive a specific highlight: | ||
|
|
||
| ```rs | ||
| if /([\w:.-]+)\s*=/ { | ||
| yield $1 as variable; | ||
| yield other; | ||
| } | ||
| ``` | ||
|
|
||
| The full regex match is still consumed. | ||
| Only capture group `$1` receives the `variable` highlight; everything else falls through to the following `yield`. | ||
|
|
||
| ## Variables and the input position | ||
|
|
||
| You can store the current input offset in a variable and compare against it later: | ||
|
|
||
| ```rs | ||
| var indentation = off; | ||
| // ...later... | ||
| if off <= indentation { | ||
| break; | ||
| } | ||
| ``` | ||
|
|
||
| `off` is the built-in register for the current position in the line. | ||
| [yaml.lsh](yaml.lsh) uses this pattern to detect when a multi-line string ends. | ||
|
|
||
| ## Calling other definitions | ||
|
|
||
| Definitions can call helper functions or other definitions. | ||
| This is how [markdown.lsh](markdown.lsh) delegates the contents of fenced code blocks: | ||
|
|
||
| ```rs | ||
| if /(?i:json)/ { | ||
| loop { | ||
| await input; | ||
| if /\s*```/ { return; } | ||
| else { json(); if /.*/ {} } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The `if /.*/ {}` at the end consumes any text that the nested definition did not consume itself. | ||
| Without that final match, `await input` would see remaining input on the current line and continue immediately instead of advancing to the next line. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.