Make fold combinators total to enable verified parsers

## Problem

The core fold combinators in lean4-parser — `efoldlPAux`, `foldr`, `takeUntil`, `dropUntil`, `count`, and `countUntil` — are all `partial def`. This prevents proving any properties about parsers built on top of them.

For a user building a verified parser for a complex grammar, the `partial` keyword is a hard barrier: Lean treats `partial def` as an opaque axiom. No theorem can unfold, reason about, or establish contracts on any combinator that transitively depends on a `partial def`. Since virtually every repetition combinator (`foldl`, `many`, `manyChars`, `sepBy`, `takeMany`, etc.) delegates to `efoldlP`, this effectively blocks formal verification of any parser that uses loops.

## Analysis: Why are these combinators partial?

`efoldlPAux` recurses on the stream state `s`:

```lean
private partial def efoldlPAux [Inhabited ε] [Inhabited σ] [Inhabited β]
  (f : β → α → ParserT ε σ τ m β) (p : ParserT ε σ τ m α) (y : β) (s : σ) :
  m (Parser.Result ε σ (β × ε × Bool)) :=
  let savePos := Stream.getPosition s
  p s >>= fun
    | .ok s x => f y x s >>= fun
      | .ok s y => efoldlPAux f p y s
      | .error s e => ...
    | .error s e => ...
```

The recursive call `efoldlPAux f p y s` receives an updated stream `s`, but Lean has no proof that `s` has actually shrunk. The challenge is that:

1. `ParserT` is polymorphic in the monad `m`, so the recursive call `p s >>= ...` returns `m (Parser.Result ε σ α)` — we cannot pattern-match on the result to extract the new stream state within `termination_by` since we are inside a monadic bind.

2. There is no field in `Parser.Stream` that provides a well-founded measure for the stream. `Std.Stream` gives us `next?` but no way to assert that the stream is finite or that `next?` decreases some measure.

## Strategies considered

We evaluated four strategies for making these combinators total. They are listed from most pragmatic to most principled.

### Strategy 1: Fuel derived from stream length

Add a `remaining : σ → Nat` field to `Parser.Stream` that returns an upper bound on the number of tokens left in the stream. This value must strictly decrease when `next?` returns `some`. The fold combinators then take a `fuel : Nat` parameter (defaulting to `Stream.remaining s`) and recurse structurally on it:

```lean
protected class Parser.Stream (σ : Type _) (τ : outParam (Type _)) extends Std.Stream σ τ where
  Position : Type _
  getPosition : σ → Position
  setPosition : σ → Position → σ
  remaining : σ → Nat

private def efoldlPAux [Inhabited ε] [Inhabited σ] [Inhabited β]
  (f : β → α → ParserT ε σ τ m β) (p : ParserT ε σ τ m α) (y : β) (s : σ)
  (fuel : Nat := Stream.remaining s) :
  m (Parser.Result ε σ (β × ε × Bool)) :=
  match fuel with
  | 0 => return .ok s (y, Error.unexpected (Stream.getPosition s) none, false)
  | fuel' + 1 =>
    let savePos := Stream.getPosition s
    p s >>= fun
      | .ok s x => f y x s >>= fun
        | .ok s y => efoldlPAux f p y s (min fuel' (Stream.remaining s))
        | .error s e => return .ok (Stream.setPosition s savePos) (y, e, true)
      | .error s e => return .ok (Stream.setPosition s savePos) (y, e, false)
```

**Merits:**
- Works for any monad `m`, not just `Id`. The structural recursion on `Nat` eliminates the need to pattern-match on monadic results.
- Minimal API surface change — one new field with a clear contract.
- Trivial to implement for all standard stream types (`String.Slice` → `stopPos - startPos`, `Subarray` → `stop - start`, `OfList` → `next.length`, etc.).
- The `fuel` parameter defaults to `Stream.remaining s`, so existing call sites are unchanged.
- The `min fuel' (Stream.remaining s)` in the recursive call means fuel is automatically clamped if the stream actually shrinks faster than one token per iteration.
- When fuel runs out without the parser failing, the combinator treats it as if `p` failed — preserving the original semantics for well-behaved parsers while guaranteeing termination for all parsers.
- Unlocks verification immediately: downstream projects can prove properties about parsers that call `foldl`, `many`, `sepBy`, etc., since the definitions are now unfoldable.

**Tradeoffs:**
- Every `Parser.Stream` instance must implement `remaining`. For custom stream types this is an additional obligation, though typically straightforward (count the elements between the current position and the end).
- The `fuel = 0` branch is semantically imprecise for *consuming* parsers on non-empty input — it stops early. In practice this path is unreachable when `remaining` is a faithful bound, but the type does not enforce this statically.

### Strategy 2: Specialize to `Id` for proofs

Keep the combinators `partial` in the general `ParserT` case, but provide parallel total definitions specialized to the `Id` monad (i.e., `Parser ε σ τ α`). When `m = Id`, monadic bind is just function application, allowing:

```lean
def efoldlPAux_total (f : β → α → Parser ε σ τ β) (p : Parser ε σ τ α) (y : β) (s : σ) :
  Parser.Result ε σ (β × ε × Bool) :=
  match p s with
  | .ok s x => match f y x s with
    | .ok s y => efoldlPAux_total f p y s
    | .error s e => ...
  | .error s e => ...
termination_by remaining s
```

**Merits:**
- Zero API changes to `Parser.Stream`.
- Pattern matching on `Parser.Result` gives direct access to the new stream state, enabling `termination_by remaining s` if a suitable `remaining` function exists.

**Tradeoffs:**
- Two parallel implementations that must be kept in sync.
- Proofs only apply to `Parser` (the `Id` specialization), not to `ParserT`.
- Requires a decreasing lemma (`remaining s' < remaining s` when `p s = .ok s' x`), which must either be assumed or proven per-parser. This pushes complexity to the call site.

### Strategy 3: "Consuming parser" contract

Define a `Consuming` predicate asserting that a parser always strictly advances the stream on success:

```lean
def Consuming (p : ParserT ε σ τ m α) : Prop :=
  ∀ s s' x, p s = pure (.ok s' x) → remaining s' < remaining s
```

The fold combinators then take `[Consuming p]` as a hypothesis and use well-founded recursion.

**Merits:**
- The most principled approach: it captures exactly the property needed for termination.
- Enables tighter reasoning — you can assert that specific parsers consume input.

**Tradeoffs:**
- Requires proving `Consuming` for every parser passed to a fold combinator. For primitive parsers like `anyToken` this is straightforward, but for complex composite parsers the proofs can be substantial.
- Interacts poorly with the monadic layer: establishing `Consuming` for `p >>= f` requires reasoning about monadic bind, which in the general `m` case is essentially impossible.
- Higher barrier to adoption: even non-verification users pay a proof obligation cost.

### Strategy 4: Inductive step type (`FoldStep`)

Replace the `Bool` continuation flag in `efoldlP` with a richer inductive type:

```lean
inductive FoldStep (β : Type _)
  | cont (val : β)        -- continue folding
  | done (val : β)        -- stop, parser succeeded but fold is complete
  | fail                  -- stop, parser failed
```

This makes the fold's control flow explicit and amenable to well-founded recursion, since each `cont` can be required to come with evidence of progress.

**Merits:**
- Clean separation of control flow from error handling.
- The inductive structure can carry proofs.

**Tradeoffs:**
- Significant API redesign affecting all fold-based combinators.
- The actual termination argument still needs a `remaining`-like measure — `FoldStep` alone doesn't provide one.
- Marginal benefit for downstream proofs: users reason about their parsers that *call* fold combinators, not about the fold invariant itself.

## Recommendation

**Strategy 1** is the recommended approach for lean4-parser. It provides the strongest cost-benefit ratio:

| Criterion | Strategy 1 | Strategy 2 | Strategy 3 | Strategy 4 |
|---|---|---|---|---|
| Works for all monads | Yes | No (`Id` only) | In theory | In theory |
| API breakage | Minimal (1 field) | None | Moderate | Significant |
| Proof obligation for users | None | Per-parser lemma | Per-parser proof | Per-parser proof |
| Unlocks downstream verification | Yes | Partially | Yes | Yes |
| Implementation complexity | Low | Medium | High | High |

The key insight is that **Strategy 1 is the only approach that works *inside* the monadic bind** without specializing `m`. Since `ParserT` is parameterized over an arbitrary monad, we cannot pattern-match on the result of `p s >>= ...` to extract the new stream state for a `termination_by` clause. Structural recursion on a `Nat` fuel parameter sidesteps this entirely.

Strategies 3 and 4 can be layered on top of Strategy 1 as refinements — for instance, a `Consuming` typeclass could provide *stronger* guarantees about specific parsers, and `FoldStep` could improve ergonomics for custom folds. But neither is necessary to unblock verification, and both impose costs on non-verification users.

## Impact on verified parsers

With Strategy 1 in place, a project building a verified parser for a complex grammar gains:

1. **All fold-based combinators become unfoldable** — `foldl`, `many`, `manyChars`, `sepBy`, `takeMany`, etc. are no longer opaque axioms.
2. **`termination_by` for recursive parsers** — mutual/recursive parsers can use `Stream.remaining` as their well-founded measure, since the fold combinators they call are now total and the `remaining` field provides the decreasing measure.
3. **Compositional reasoning** — properties proven about sub-parsers compose through the fold combinators, since the fold definitions are transparent Lean terms rather than `partial` black boxes.
4. **No proof obligation at the combinator level** — users need only implement `remaining` for their stream type (typically a one-liner), and all combinators become total automatically.

## Scope of changes

The implementation requires:

1. **`Parser/Stream.lean`**: Add `remaining : σ → Nat` to the `Parser.Stream` class. Update all provided instances.
2. **`Parser/Parser.lean`**: Replace `partial def efoldlPAux` with a total `def` using fuel.
3. **`Parser/Basic.lean`**: Replace 5 `partial def`s (`foldr`, `takeUntil`, `dropUntil`, `count`, `countUntil`) with total versions using fuel.
4. **Downstream**: Any `Parser.Stream` instance must add a `remaining` implementation.

## co-authored with Github Copilot + Claude Opus 4.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make fold combinators total to enable verified parsers #95

Problem

Analysis: Why are these combinators partial?

Strategies considered

Strategy 1: Fuel derived from stream length

Strategy 2: Specialize to `Id` for proofs

Strategy 3: "Consuming parser" contract

Strategy 4: Inductive step type (`FoldStep`)

Recommendation

Impact on verified parsers

Scope of changes

co-authored with Github Copilot + Claude Opus 4.6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Criterion	Strategy 1	Strategy 2	Strategy 3	Strategy 4
Works for all monads	Yes	No (`Id` only)	In theory	In theory
API breakage	Minimal (1 field)	None	Moderate	Significant
Proof obligation for users	None	Per-parser lemma	Per-parser proof	Per-parser proof
Unlocks downstream verification	Yes	Partially	Yes	Yes
Implementation complexity	Low	Medium	High	High

Make fold combinators total to enable verified parsers #95

Description

Problem

Analysis: Why are these combinators partial?

Strategies considered

Strategy 1: Fuel derived from stream length

Strategy 2: Specialize to Id for proofs

Strategy 3: "Consuming parser" contract

Strategy 4: Inductive step type (FoldStep)

Recommendation

Impact on verified parsers

Scope of changes

co-authored with Github Copilot + Claude Opus 4.6

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Strategy 2: Specialize to `Id` for proofs

Strategy 4: Inductive step type (`FoldStep`)