Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions crates/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# crates

This directory contains the crates that make up Edit and its supporting tooling.

* `edit`: Main editor binary and library<br>
It is split apart into a library to allow for benchmarks.
* `lsh`: Syntax-highlighting compiler and runtime
* `lsh-bin`: A small CLI for experimenting with and debugging LSH output
* `stdext`: Shared utility code used across the workspace
* `unicode-gen`: Code generation utilities for Unicode LUTs
31 changes: 21 additions & 10 deletions crates/lsh-bin/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,24 +31,24 @@ enum SubCommands {
#[derive(FromArgs, PartialEq, Debug)]
#[argh(subcommand, name = "compile", description = "Generate Rust code from .lsh files")]
struct SubCommandOneCompile {
#[argh(positional, description = "source .lsh file or directory")]
lsh: PathBuf,
#[argh(positional, description = "source .lsh files or directories")]
lsh: Vec<PathBuf>,
}

#[derive(FromArgs, PartialEq, Debug)]
#[argh(subcommand, name = "assembly", description = "Generate assembly from .lsh files")]
struct SubCommandAssembly {
#[argh(positional, description = "source .lsh file or directory")]
lsh: PathBuf,
#[argh(positional, description = "source .lsh files or directories")]
lsh: Vec<PathBuf>,
}

#[derive(FromArgs, PartialEq, Debug)]
#[argh(subcommand, name = "render", description = "Highlight text files")]
struct SubCommandRender {
#[argh(positional, description = "source .lsh file or directory")]
lsh: PathBuf,
#[argh(positional, description = "source text file")]
#[argh(option, description = "source text file")]
input: PathBuf,
#[argh(positional, description = "source .lsh files or directories")]
lsh: Vec<PathBuf>,
}

pub fn main() {
Expand All @@ -67,21 +67,32 @@ fn run() -> anyhow::Result<()> {
let mut read_lsh = |path: &Path| {
if path.is_dir() { generator.read_directory(path) } else { generator.read_file(path) }
};
let mut read_lsh_inputs = |paths: &[PathBuf]| -> anyhow::Result<()> {
if paths.is_empty() {
bail!("At least one .lsh file or directory is required");
}

for path in paths {
read_lsh(path)?;
}

Ok(())
};

match &command.sub {
SubCommands::Compile(cmd) => {
read_lsh(&cmd.lsh)?;
read_lsh_inputs(&cmd.lsh)?;
let output = generator.generate_rust()?;
_ = stdout().write_all(output.as_bytes());
}
SubCommands::Assembly(cmd) => {
read_lsh(&cmd.lsh)?;
read_lsh_inputs(&cmd.lsh)?;
let vt = stdout().is_terminal();
let output = generator.generate_assembly(vt)?;
_ = stdout().write_all(output.as_bytes());
}
SubCommands::Render(cmd) => {
read_lsh(&cmd.lsh)?;
read_lsh_inputs(&cmd.lsh)?;
run_render(generator, &cmd.input)?;
}
}
Expand Down
26 changes: 26 additions & 0 deletions crates/lsh/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# lsh

`lsh` contains the compiler and runtime for Edit's syntax-highlighting system.

At a high level:
* Language definitions live in `definitions/*.lsh`
* The compiler lowers them into bytecode
* The runtime executes the bytecode on the input text line by line

To understand the definition language itself, read [definitions/README.md](definitions/README.md).

For debugging and optimizing language definitions use `lsh-bin`.
To see the generated assembly, for example:
```sh
# Show the generated assembly of a file or directory
cargo run -p lsh-bin -- assembly crates/lsh/definitions/diff.lsh

# Due to the lack of include statements, you must specify included files manually.
# Here, git_commit.lsh implicitly relies on diff() from diff.lsh.
cargo run -p lsh-bin -- assembly crates/lsh/definitions/git_commit.lsh crates/lsh/definitions/diff.lsh
```

Or to render a file:
```sh
cargo run -p lsh-bin -- render --input assets/highlighting-tests/html.html crates/lsh/definitions
```
153 changes: 153 additions & 0 deletions crates/lsh/definitions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# LSH Definitions

This directory contains syntax highlighting definitions.
Each `.lsh` file describes how to highlight one or more file types.
The compiler turns these definitions into bytecode, and the runtime executes that bytecode against the input one line at a time.

Essentially, LSH is a small, line-oriented coroutine language for writing lexers.

## The basic idea

Most definitions follow the same pattern:
* Select a definition by file name or path
* Walk the current line from left to right
* Try regexes at the current position
* `yield` highlight kinds as tokens are recognized
* Use `await input` only when a construct needs to continue onto the next line

## A minimal definition

A definition is a `pub fn` with attributes that tell the editor when to use it:

```rs
#[display_name = "Diff"]
#[path = "**/*.diff"]
#[path = "**/*.patch"]
pub fn diff() {
if /(?:diff|---|\+\+\+).*/ {
yield meta.header;
} else if /-.*/ {
yield markup.deleted;
} else if /\+.*/ {
yield markup.inserted;
}
}
```

`#[display_name]` sets the human-readable name.
`#[path]` is a glob pattern; you can have as many as you need.
Functions without `pub` are private helpers that can be called from other definitions.

## How execution works

The runtime feeds input to a definition one line at a time.
Within a line, matching is always left to right.

Each `if /regex/` tries to match at the current position:
* On success, the input position advances past the match and the block runs
* On failure, the input position does not move and the `else` branch, if any, runs

Definitions behave like coroutines:
* If execution reaches `await input`, the function suspends and resumes on the next line
* If the function returns, the next line starts again from the top of the function

## Highlighting with `yield`

`yield <kind>` emits a highlight span.
Everything between the previous `yield` and the current position is colored with `<kind>`.

> [!NOTE]
> This can be confusing in practice, because `yield` does not just color the regex it appears in.
> Long term, the goal is for `yield` to apply only to the regex it appears in, or to some other explicitly specified range.

Highlight kinds are dotted identifiers such as `comment`, `string`, `keyword.control`, `constant.numeric`, and `markup.bold`.
Kinds are interned at compile time. You can invent new ones, but the editor still needs to know what color to assign them.

`yield other` switches back to the default, unhighlighted kind.
Use it when you want to reset the current highlight between tokens. See [json.lsh](json.lsh) for a representative pattern.

## Multi-line constructs

Single-line constructs need no special handling.
For constructs that can span lines, such as block comments or fenced code blocks, combine `loop` or `until` with `await input`:

```rs
if /\/\*/ {
loop {
yield comment;
await input;
if /\*\// {
yield comment;
break;
}
}
}
```

`await input` means "advance to the next line if there is no more input to consume here."
If there is still unconsumed text on the current line, it is a no-op and execution continues immediately.

One important detail: if you want the remainder of the current line to stay highlighted, emit the appropriate `yield` before `await input`.

## Control flow

| Expression | Meaning |
|------------|---------|
| `if /pat/ { ... }` | Match `pat` at the current position and enter the block on success |
| `else if /pat/ { ... }` | Try another pattern if the previous one failed |
| `else { ... }` | Fallback branch |
| `loop { ... }` | Loop until `break`, `continue`, or `return` |
| `until /pat/ { ... }` | Repeat the body until `pat` matches, then consume the match and exit |
| `break` | Exit the innermost loop |
| `continue` | Restart the innermost loop |
| `return` | Exit the current function |

`until /$/ { ... }` is the usual way to say "keep processing until end-of-line."

## Capture groups

Regexes can have capture groups.
Use `yield $N as <kind>` when only part of the match should receive a specific highlight:

```rs
if /([\w:.-]+)\s*=/ {
yield $1 as variable;
yield other;
}
```

The full regex match is still consumed.
Only capture group `$1` receives the `variable` highlight; everything else falls through to the following `yield`.

## Variables and the input position

You can store the current input offset in a variable and compare against it later:

```rs
var indentation = off;
// ...later...
if off <= indentation {
break;
}
```

`off` is the built-in register for the current position in the line.
[yaml.lsh](yaml.lsh) uses this pattern to detect when a multi-line string ends.

## Calling other definitions

Definitions can call helper functions or other definitions.
This is how [markdown.lsh](markdown.lsh) delegates the contents of fenced code blocks:

```rs
if /(?i:json)/ {
loop {
await input;
if /\s*```/ { return; }
else { json(); if /.*/ {} }
}
}
```

The `if /.*/ {}` at the end consumes any text that the nested definition did not consume itself.
Without that final match, `await input` would see remaining input on the current line and continue immediately instead of advancing to the next line.
2 changes: 1 addition & 1 deletion crates/lsh/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

//! Welcome to Leonard's Syntax Highlighter (LSH), otherwise known as
//! Welcome to the Lightweight Syntax Highlighter (LSH), otherwise known as
Comment thread
lhecker marked this conversation as resolved.
//! Leonard's Shitty Highlighter, which is really what it is.
//!
//! ## Architecture
Expand Down
Loading