This document explains how the MiniC parser turns source text into an AST. We start with a concrete example, then introduce the tools the parser uses.
Consider the expression 1 + 2. The parser needs to produce the tree:
Add
├── Literal(1)
└── Literal(2)
Here is roughly what happens, step by step:
- The parser calls the
expressionfunction with input"1 + 2". expressioncallsadditive, which callsmultiplicativeto get the left operand.multiplicativecallsunary, which callsprimary, which callsatom.atomtries to match a literal — it sees"1"and returnsLiteral(Int(1)). The remaining input is" + 2".- Back in
additive: remaining input starts with" + ", so it matches the+operator and callsmultiplicativeagain for the right side. multiplicative→unary→primary→atomsees"2"and returnsLiteral(Int(2)). Remaining input is"".additivebuildsExpr::Add(Literal(1), Literal(2))and returns it.
The same recursive logic handles arbitrarily complex expressions like
a * (b + c) - sqrt(d).
MiniC uses a library called nom to build the parser. nom provides
combinators — small functions that each recognise a tiny piece of syntax
and can be composed (combined) into parsers for larger constructs.
Every parser in nom follows the same contract:
- Input: a string slice
&str - Output:
IResult<&str, T>— eitherOk((remaining, value))on success, orErr(...)on failure
remaining is the part of the input that was not yet consumed. This is how
parsers chain: one parser hands its leftover input to the next.
| Combinator | What it does | Example use |
|---|---|---|
tag("x") |
Match the exact string "x" |
tag("if") to match the keyword if |
char('x') |
Match the exact character 'x' |
char('(') for open parenthesis |
digit1 |
Match one or more decimal digits | Integer literals |
alt((a, b, c)) |
Try a; if it fails, try b; then c |
Choice between statement forms |
tuple((a, b, c)) |
Parse a, then b, then c; return all three |
Parse keyword + name + body |
preceded(a, b) |
Parse a (discard), then parse b (keep) |
Skip whitespace before a token |
delimited(a, b, c) |
Parse a, then b, then c; return only b |
( expression ) |
map(p, f) |
Parse with p, transform the result with function f |
Build an AST node from raw text |
opt(p) |
Try p; return Some(result) or None |
Optional else branch |
verify(p, pred) |
Parse with p; fail if pred(result) is false |
Reject reserved words as identifiers |
many0(p) |
Apply p zero or more times; collect results |
Parse all function declarations |
The parser is split into six files, each responsible for one grammatical category. The dependencies flow from simple to complex:
program.rs ← top-level entry point
└── functions.rs ← function declarations
└── statements.rs ← statements (if, while, block, …)
└── expressions.rs ← arithmetic, comparison, …
├── literals.rs ← numbers, strings, booleans
└── identifiers.rs ← variable names
This mirrors how grammars are traditionally presented and makes it easy to find the rule for any construct.
The expression 1 + 2 * 3 must parse as 1 + (2 * 3), not (1 + 2) * 3.
This requires that * "binds tighter" than +.
MiniC encodes precedence through the call chain: each function calls the next-higher-precedence function to get its operands. Higher in the chain = tighter binding.
expression
└── logical_or (lowest precedence: or)
└── logical_and (and)
└── logical_not (!)
└── relational (== != < <= > >=)
└── additive (+ -)
└── multiplicative (* /)
└── unary (unary -)
└── primary (atoms + indexing)
└── atom
When additive needs its right operand, it calls multiplicative. So *
always groups before + — naturally, without any precedence table.
Why does this work? Consider 1 + 2 * 3:
additivereads1(viamultiplicative).- It sees
+and callsmultiplicativeagain for the right side. multiplicativereads2 * 3and returnsMul(2, 3)as a single unit.additivebuildsAdd(1, Mul(2, 3)). ✓
For 1 - 2 - 3, we want (1 - 2) - 3, not 1 - (2 - 3). This is called
left-associativity.
Each level that has left-associative operators uses an accumulator loop instead of recursion:
fn additive(input) {
let (rest, mut acc) = multiplicative(input)?; // left-most operand
loop {
if let Ok((rest, right)) = parse("+" then multiplicative) {
acc = Add(acc, right); // fold left: (acc + right) becomes new acc
} else if let Ok((rest, right)) = parse("-" then multiplicative) {
acc = Sub(acc, right);
} else {
break;
}
}
return acc;
}For 1 - 2 - 3:
- Start:
acc = 1 - Loop 1: see
-2,acc = Sub(1, 2) - Loop 2: see
-3,acc = Sub(Sub(1, 2), 3) - Loop 3: no more
-or+, stop.
Result: (1 - 2) - 3 ✓
No separate lexer. Many compilers have two separate phases: a lexer
that turns characters into tokens (e.g., "42" → Token::Int(42)), then a
parser that processes tokens. MiniC skips the lexer and lets nom work
directly on characters. This is simpler for a small language and keeps the
code in one place.
The parser produces an untyped AST. Every node in the output carries
ty: () — an empty placeholder. Type information is not the parser's
responsibility; it is computed in the separate type-checking stage. Keeping
parsing and type checking apart makes each phase shorter and independently
testable.
What to read next → 05-type-checker.md