Skip to content

M3: Guaranteed self-tail-call optimization (loop lowering)#37

Open
assapir wants to merge 1 commit into
mainfrom
worktree-agent-af4713bc66f99f529
Open

M3: Guaranteed self-tail-call optimization (loop lowering)#37
assapir wants to merge 1 commit into
mainfrom
worktree-agent-af4713bc66f99f529

Conversation

@assapir

@assapir assapir commented Jun 27, 2026

Copy link
Copy Markdown
Owner

What

Guaranteed self-tail-call optimization: when a function returns a call to itself in tail position, codegen lowers the recursion to a loop instead of a stack-growing call + ret. So tail self-recursion runs in constant stack and cannot overflow — the guarantee the language needs as for is removed (later M3 wave) and recursion becomes the iteration primitive.

This is codegen-only — there is no surface syntax.

How

  • Analysis (body_has_self_tail_call / expr_has_self_tail_call / is_self_tail_call): a call is in tail position when it is the value the function returns directly. Tail position flows through ?/| match arms, if/ternary branches, < > block tails, and |> pipelines — and not into an operator operand, a call argument, an array element, etc.
  • Transform (generate_tail_expr + generate_tail_if / generate_tail_match / emit_tail_self_call): the function's parameter allocas are reused as loop-carried slots. A loop header is branched to from the entry block; a tail self-call evaluates all its args first (against the current iteration's params, so f(n-1, acc+n) is correct), stores them into the slots, and brs back to the header. The tail emitter mirrors the existing generate_if/generate_match shape but threads an Option (a tail self-call yields None, having branched away).
  • Non-tail self-calls (n * fact(n-1)) and calls to other functions stay ordinary calls. General/mutual tail calls (LLVM musttail) are an explicit deferred follow-up.
  • Designed to be safe-by-degradation: any analysis/codegen disagreement degrades to "no optimization", never a miscompile — a back-edge is emitted only where is_self_tail_call is true at codegen time, and the function is always terminated by the final ret.

Tests & docs

  • examples/tail_recursion.ql — recurses 1,000,000 deep (would overflow the stack without TCO), exits 16. Wired into tests/examples_test.rs so it runs under the JIT and native AOT (clang and gcc).
  • tests/tail_call_test.rs — deep ternary/match-arm/block recursion runs in constant stack and computes the right value; arg-before-overwrite ordering; non-tail and cross-function calls unaffected; unconditional / all-arms-recurse modules still pass LLVM verification.
  • LANGUAGE.md — documents the guarantee ("tail self-recursion is optimized to a loop") and adds a feature-matrix row.

Gate

cargo build, cargo test (incl. the native-AOT examples gate, clang+gcc), cargo fmt --all -- --check, and cargo clippy --all-targets --all-features -- -D warnings all pass. Ran /code-review (no surviving findings) and /simplify (one minor consolidation).

🤖 Generated with Claude Code

When a function returns a call to itself in tail position, lower the
recursion to a loop instead of a stack-growing `call` + `ret`, so tail
self-recursion runs in constant stack and cannot overflow — a guarantee
the language needs as `for` is removed and recursion becomes the iteration
primitive.

Codegen-only (no surface syntax). Adds a tail-position analysis
(`body_has_self_tail_call`) and a parallel tail-aware emitter
(`generate_tail_expr` + `generate_tail_if`/`generate_tail_match`/
`emit_tail_self_call`) that share one `is_self_tail_call` predicate. Tail
position flows through `?`/`|` match arms, `if`/ternary branches, `< >`
block tails, and `|>` pipelines. The function's parameter allocas are
reused as loop-carried slots; a tail self-call evaluates all args first
(against the current iteration's params), stores them, and branches back
to a loop header. Non-tail self-calls and calls to other functions stay
ordinary calls (general/mutual tail calls are a deferred follow-up).

- examples/tail_recursion.ql recurses 1,000,000 deep (would overflow
  without TCO) and exits 16; wired into the examples gate (JIT + native
  AOT under clang and gcc).
- tests/tail_call_test.rs: deep ternary/match-arm/block recursion runs in
  constant stack; arg-before-overwrite ordering; non-tail and
  cross-function calls unaffected; all-arms-recurse modules still verify.
- LANGUAGE.md documents the guarantee and adds a feature-matrix row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant