Problem
The core fold combinators in lean4-parser — efoldlPAux, foldr, takeUntil, dropUntil, count, and countUntil — are all partial def. This prevents proving any properties about parsers built on top of them.
For a user building a verified parser for a complex grammar, the partial keyword is a hard barrier: Lean treats partial def as an opaque axiom. No theorem can unfold, reason about, or establish contracts on any combinator that transitively depends on a partial def. Since virtually every repetition combinator (foldl, many, manyChars, sepBy, takeMany, etc.) delegates to efoldlP, this effectively blocks formal verification of any parser that uses loops.
Analysis: Why are these combinators partial?
efoldlPAux recurses on the stream state s:
private partial def efoldlPAux [Inhabited ε] [Inhabited σ] [Inhabited β]
(f : β → α → ParserT ε σ τ m β) (p : ParserT ε σ τ m α) (y : β) (s : σ) :
m (Parser.Result ε σ (β × ε × Bool)) :=
let savePos := Stream.getPosition s
p s >>= fun
| .ok s x => f y x s >>= fun
| .ok s y => efoldlPAux f p y s
| .error s e => ...
| .error s e => ...
The recursive call efoldlPAux f p y s receives an updated stream s, but Lean has no proof that s has actually shrunk. The challenge is that:
-
ParserT is polymorphic in the monad m, so the recursive call p s >>= ... returns m (Parser.Result ε σ α) — we cannot pattern-match on the result to extract the new stream state within termination_by since we are inside a monadic bind.
-
There is no field in Parser.Stream that provides a well-founded measure for the stream. Std.Stream gives us next? but no way to assert that the stream is finite or that next? decreases some measure.
Strategies considered
We evaluated four strategies for making these combinators total. They are listed from most pragmatic to most principled.
Strategy 1: Fuel derived from stream length
Add a remaining : σ → Nat field to Parser.Stream that returns an upper bound on the number of tokens left in the stream. This value must strictly decrease when next? returns some. The fold combinators then take a fuel : Nat parameter (defaulting to Stream.remaining s) and recurse structurally on it:
protected class Parser.Stream (σ : Type _) (τ : outParam (Type _)) extends Std.Stream σ τ where
Position : Type _
getPosition : σ → Position
setPosition : σ → Position → σ
remaining : σ → Nat
private def efoldlPAux [Inhabited ε] [Inhabited σ] [Inhabited β]
(f : β → α → ParserT ε σ τ m β) (p : ParserT ε σ τ m α) (y : β) (s : σ)
(fuel : Nat := Stream.remaining s) :
m (Parser.Result ε σ (β × ε × Bool)) :=
match fuel with
| 0 => return .ok s (y, Error.unexpected (Stream.getPosition s) none, false)
| fuel' + 1 =>
let savePos := Stream.getPosition s
p s >>= fun
| .ok s x => f y x s >>= fun
| .ok s y => efoldlPAux f p y s (min fuel' (Stream.remaining s))
| .error s e => return .ok (Stream.setPosition s savePos) (y, e, true)
| .error s e => return .ok (Stream.setPosition s savePos) (y, e, false)
Merits:
- Works for any monad
m, not just Id. The structural recursion on Nat eliminates the need to pattern-match on monadic results.
- Minimal API surface change — one new field with a clear contract.
- Trivial to implement for all standard stream types (
String.Slice → stopPos - startPos, Subarray → stop - start, OfList → next.length, etc.).
- The
fuel parameter defaults to Stream.remaining s, so existing call sites are unchanged.
- The
min fuel' (Stream.remaining s) in the recursive call means fuel is automatically clamped if the stream actually shrinks faster than one token per iteration.
- When fuel runs out without the parser failing, the combinator treats it as if
p failed — preserving the original semantics for well-behaved parsers while guaranteeing termination for all parsers.
- Unlocks verification immediately: downstream projects can prove properties about parsers that call
foldl, many, sepBy, etc., since the definitions are now unfoldable.
Tradeoffs:
- Every
Parser.Stream instance must implement remaining. For custom stream types this is an additional obligation, though typically straightforward (count the elements between the current position and the end).
- The
fuel = 0 branch is semantically imprecise for consuming parsers on non-empty input — it stops early. In practice this path is unreachable when remaining is a faithful bound, but the type does not enforce this statically.
Strategy 2: Specialize to Id for proofs
Keep the combinators partial in the general ParserT case, but provide parallel total definitions specialized to the Id monad (i.e., Parser ε σ τ α). When m = Id, monadic bind is just function application, allowing:
def efoldlPAux_total (f : β → α → Parser ε σ τ β) (p : Parser ε σ τ α) (y : β) (s : σ) :
Parser.Result ε σ (β × ε × Bool) :=
match p s with
| .ok s x => match f y x s with
| .ok s y => efoldlPAux_total f p y s
| .error s e => ...
| .error s e => ...
termination_by remaining s
Merits:
- Zero API changes to
Parser.Stream.
- Pattern matching on
Parser.Result gives direct access to the new stream state, enabling termination_by remaining s if a suitable remaining function exists.
Tradeoffs:
- Two parallel implementations that must be kept in sync.
- Proofs only apply to
Parser (the Id specialization), not to ParserT.
- Requires a decreasing lemma (
remaining s' < remaining s when p s = .ok s' x), which must either be assumed or proven per-parser. This pushes complexity to the call site.
Strategy 3: "Consuming parser" contract
Define a Consuming predicate asserting that a parser always strictly advances the stream on success:
def Consuming (p : ParserT ε σ τ m α) : Prop :=
∀ s s' x, p s = pure (.ok s' x) → remaining s' < remaining s
The fold combinators then take [Consuming p] as a hypothesis and use well-founded recursion.
Merits:
- The most principled approach: it captures exactly the property needed for termination.
- Enables tighter reasoning — you can assert that specific parsers consume input.
Tradeoffs:
- Requires proving
Consuming for every parser passed to a fold combinator. For primitive parsers like anyToken this is straightforward, but for complex composite parsers the proofs can be substantial.
- Interacts poorly with the monadic layer: establishing
Consuming for p >>= f requires reasoning about monadic bind, which in the general m case is essentially impossible.
- Higher barrier to adoption: even non-verification users pay a proof obligation cost.
Strategy 4: Inductive step type (FoldStep)
Replace the Bool continuation flag in efoldlP with a richer inductive type:
inductive FoldStep (β : Type _)
| cont (val : β) -- continue folding
| done (val : β) -- stop, parser succeeded but fold is complete
| fail -- stop, parser failed
This makes the fold's control flow explicit and amenable to well-founded recursion, since each cont can be required to come with evidence of progress.
Merits:
- Clean separation of control flow from error handling.
- The inductive structure can carry proofs.
Tradeoffs:
- Significant API redesign affecting all fold-based combinators.
- The actual termination argument still needs a
remaining-like measure — FoldStep alone doesn't provide one.
- Marginal benefit for downstream proofs: users reason about their parsers that call fold combinators, not about the fold invariant itself.
Recommendation
Strategy 1 is the recommended approach for lean4-parser. It provides the strongest cost-benefit ratio:
| Criterion |
Strategy 1 |
Strategy 2 |
Strategy 3 |
Strategy 4 |
| Works for all monads |
Yes |
No (Id only) |
In theory |
In theory |
| API breakage |
Minimal (1 field) |
None |
Moderate |
Significant |
| Proof obligation for users |
None |
Per-parser lemma |
Per-parser proof |
Per-parser proof |
| Unlocks downstream verification |
Yes |
Partially |
Yes |
Yes |
| Implementation complexity |
Low |
Medium |
High |
High |
The key insight is that Strategy 1 is the only approach that works inside the monadic bind without specializing m. Since ParserT is parameterized over an arbitrary monad, we cannot pattern-match on the result of p s >>= ... to extract the new stream state for a termination_by clause. Structural recursion on a Nat fuel parameter sidesteps this entirely.
Strategies 3 and 4 can be layered on top of Strategy 1 as refinements — for instance, a Consuming typeclass could provide stronger guarantees about specific parsers, and FoldStep could improve ergonomics for custom folds. But neither is necessary to unblock verification, and both impose costs on non-verification users.
Impact on verified parsers
With Strategy 1 in place, a project building a verified parser for a complex grammar gains:
- All fold-based combinators become unfoldable —
foldl, many, manyChars, sepBy, takeMany, etc. are no longer opaque axioms.
termination_by for recursive parsers — mutual/recursive parsers can use Stream.remaining as their well-founded measure, since the fold combinators they call are now total and the remaining field provides the decreasing measure.
- Compositional reasoning — properties proven about sub-parsers compose through the fold combinators, since the fold definitions are transparent Lean terms rather than
partial black boxes.
- No proof obligation at the combinator level — users need only implement
remaining for their stream type (typically a one-liner), and all combinators become total automatically.
Scope of changes
The implementation requires:
Parser/Stream.lean: Add remaining : σ → Nat to the Parser.Stream class. Update all provided instances.
Parser/Parser.lean: Replace partial def efoldlPAux with a total def using fuel.
Parser/Basic.lean: Replace 5 partial defs (foldr, takeUntil, dropUntil, count, countUntil) with total versions using fuel.
- Downstream: Any
Parser.Stream instance must add a remaining implementation.
co-authored with Github Copilot + Claude Opus 4.6
Problem
The core fold combinators in lean4-parser —
efoldlPAux,foldr,takeUntil,dropUntil,count, andcountUntil— are allpartial def. This prevents proving any properties about parsers built on top of them.For a user building a verified parser for a complex grammar, the
partialkeyword is a hard barrier: Lean treatspartial defas an opaque axiom. No theorem can unfold, reason about, or establish contracts on any combinator that transitively depends on apartial def. Since virtually every repetition combinator (foldl,many,manyChars,sepBy,takeMany, etc.) delegates toefoldlP, this effectively blocks formal verification of any parser that uses loops.Analysis: Why are these combinators partial?
efoldlPAuxrecurses on the stream states:The recursive call
efoldlPAux f p y sreceives an updated streams, but Lean has no proof thatshas actually shrunk. The challenge is that:ParserTis polymorphic in the monadm, so the recursive callp s >>= ...returnsm (Parser.Result ε σ α)— we cannot pattern-match on the result to extract the new stream state withintermination_bysince we are inside a monadic bind.There is no field in
Parser.Streamthat provides a well-founded measure for the stream.Std.Streamgives usnext?but no way to assert that the stream is finite or thatnext?decreases some measure.Strategies considered
We evaluated four strategies for making these combinators total. They are listed from most pragmatic to most principled.
Strategy 1: Fuel derived from stream length
Add a
remaining : σ → Natfield toParser.Streamthat returns an upper bound on the number of tokens left in the stream. This value must strictly decrease whennext?returnssome. The fold combinators then take afuel : Natparameter (defaulting toStream.remaining s) and recurse structurally on it:Merits:
m, not justId. The structural recursion onNateliminates the need to pattern-match on monadic results.String.Slice→stopPos - startPos,Subarray→stop - start,OfList→next.length, etc.).fuelparameter defaults toStream.remaining s, so existing call sites are unchanged.min fuel' (Stream.remaining s)in the recursive call means fuel is automatically clamped if the stream actually shrinks faster than one token per iteration.pfailed — preserving the original semantics for well-behaved parsers while guaranteeing termination for all parsers.foldl,many,sepBy, etc., since the definitions are now unfoldable.Tradeoffs:
Parser.Streaminstance must implementremaining. For custom stream types this is an additional obligation, though typically straightforward (count the elements between the current position and the end).fuel = 0branch is semantically imprecise for consuming parsers on non-empty input — it stops early. In practice this path is unreachable whenremainingis a faithful bound, but the type does not enforce this statically.Strategy 2: Specialize to
Idfor proofsKeep the combinators
partialin the generalParserTcase, but provide parallel total definitions specialized to theIdmonad (i.e.,Parser ε σ τ α). Whenm = Id, monadic bind is just function application, allowing:Merits:
Parser.Stream.Parser.Resultgives direct access to the new stream state, enablingtermination_by remaining sif a suitableremainingfunction exists.Tradeoffs:
Parser(theIdspecialization), not toParserT.remaining s' < remaining swhenp s = .ok s' x), which must either be assumed or proven per-parser. This pushes complexity to the call site.Strategy 3: "Consuming parser" contract
Define a
Consumingpredicate asserting that a parser always strictly advances the stream on success:The fold combinators then take
[Consuming p]as a hypothesis and use well-founded recursion.Merits:
Tradeoffs:
Consumingfor every parser passed to a fold combinator. For primitive parsers likeanyTokenthis is straightforward, but for complex composite parsers the proofs can be substantial.Consumingforp >>= frequires reasoning about monadic bind, which in the generalmcase is essentially impossible.Strategy 4: Inductive step type (
FoldStep)Replace the
Boolcontinuation flag inefoldlPwith a richer inductive type:This makes the fold's control flow explicit and amenable to well-founded recursion, since each
contcan be required to come with evidence of progress.Merits:
Tradeoffs:
remaining-like measure —FoldStepalone doesn't provide one.Recommendation
Strategy 1 is the recommended approach for lean4-parser. It provides the strongest cost-benefit ratio:
Idonly)The key insight is that Strategy 1 is the only approach that works inside the monadic bind without specializing
m. SinceParserTis parameterized over an arbitrary monad, we cannot pattern-match on the result ofp s >>= ...to extract the new stream state for atermination_byclause. Structural recursion on aNatfuel parameter sidesteps this entirely.Strategies 3 and 4 can be layered on top of Strategy 1 as refinements — for instance, a
Consumingtypeclass could provide stronger guarantees about specific parsers, andFoldStepcould improve ergonomics for custom folds. But neither is necessary to unblock verification, and both impose costs on non-verification users.Impact on verified parsers
With Strategy 1 in place, a project building a verified parser for a complex grammar gains:
foldl,many,manyChars,sepBy,takeMany, etc. are no longer opaque axioms.termination_byfor recursive parsers — mutual/recursive parsers can useStream.remainingas their well-founded measure, since the fold combinators they call are now total and theremainingfield provides the decreasing measure.partialblack boxes.remainingfor their stream type (typically a one-liner), and all combinators become total automatically.Scope of changes
The implementation requires:
Parser/Stream.lean: Addremaining : σ → Natto theParser.Streamclass. Update all provided instances.Parser/Parser.lean: Replacepartial def efoldlPAuxwith a totaldefusing fuel.Parser/Basic.lean: Replace 5partial defs (foldr,takeUntil,dropUntil,count,countUntil) with total versions using fuel.Parser.Streaminstance must add aremainingimplementation.co-authored with Github Copilot + Claude Opus 4.6