Skip to content

M2: Explicit ad-hoc overloading (operators + Text comparison)#32

Merged
assapir merged 6 commits into
mainfrom
worktree-agent-aae1581907489b1c9
Jun 27, 2026
Merged

M2: Explicit ad-hoc overloading (operators + Text comparison)#32
assapir merged 6 commits into
mainfrom
worktree-agent-aae1581907489b1c9

Conversation

@assapir

@assapir assapir commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

Adds explicit ad-hoc overloading as Quilon's only polymorphism (no generics): multiple same-named top-level definitions, each with full parameter type annotations, form an overload set resolved at every call site by exact static argument types (no implicit coercion). Ambiguity / no-match are clear file:line:col diagnostics that list the candidates.

  • Operators are user-overloadable (+ - * / % == != < <= > >=) — an operator is just a named overload set. The standard built-ins (+ on Num/Text, the comparisons, print/eprint over Num/Text/Bool) are now visible overloads routed through the same mechanism, not compiler special cases (behavior preserved; all prior tests pass).
  • Text comparison (the gating deliverable): ==/!= (equality) and </<=/>/>= (lexicographic) over Text, via a new __text_cmp runtime intrinsic — typecheck + codegen + tests, end-to-end (JIT + native AOT).
  • < / > vs < > blocks: a > is the block close only when it is the last token on its line; otherwise it is the greater-than operator, so a > b works everywhere.
  • Ok/NotOk over every built-in payload (Num/Text/Bool/$) construct and tag-dispatch.

Pipeline

  • Lexer: line-final > reclassification → new Gt token.
  • Parser: operator-symbol-named definitions; </Gt as comparison operators; binary-op loops stop at a top-level operator definition (op =).
  • Typechecker: overload-set registry keyed by name (functions AND operator symbols); built-in operator/print overloads; exact-type dispatch; NoMatchingOverload / AmbiguousOverload / OverloadMissingAnnotation errors.
  • Codegen: each overload member mangled to a distinct symbol; calls/operators lowered to the resolved member; Text comparisons via __text_cmp; records GC-allocated so a record returned from a function/operator survives its frame; Bool ==/!= lowered as integer compares; match payload bindings carry their declared type so an overloaded call on a concrete sum payload dispatches by that type.

Example (mandatory, wired into the gate)

examples/overloading.ql — a user function overload set + a user operator overload (== on a record) + Text comparison; ^ exit code 161. Listed in tests/examples_test.rs (compiles + runs under JIT and native AOT, both linkers) and referenced from LANGUAGE.md.

Tests

Unit + integration + negative coverage: overload resolution by type, operator overload on a user type, Text ==/</> ordering, Bool ==, Ok($)/Ok(Text)/Ok(Num) dispatch, a user sum's concrete (Num and Bool) payload dispatching an overload, and negative cases (ambiguous / no-match / missing annotation → compile error; the > footguns → parse error).

Review fixes included

Found via /code-review: codegen now reads each non-overloaded function's declared return type so a call result feeding an overloaded operator mangles correctly (BUG1); the parser no longer absorbs a following operator definition into the previous expression body (BUG2); loop-variable and sum-payload bindings carry their concrete type for overloaded dispatch (BUG4 + the Bool-payload miscompile). /simplify: shared is_operator_symbol + FunctionDecl::is_inert_io_placeholder in the AST crate; lazy error-candidate construction.

Deferred (separate PR)

Concrete sum-payload typing: a non-numeric Result payload (Ok("x")) routed through an overload is part of the existing non-numeric-payload limitation (documented in LANGUAGE.md). A Result payload resolves deterministically as Num for overloads (numeric payloads work end-to-end). User sum types' concrete payloads dispatch correctly.

🤖 Generated with Claude Code

assapir and others added 6 commits June 27, 2026 16:36
Adds explicit ad-hoc overloading as the language's only polymorphism (no
generics): multiple same-named top-level definitions with full parameter type
annotations form an overload set, resolved at each call site by EXACT static
argument types (no implicit coercion); ambiguity / no-match are clear
file:line:col diagnostics listing the candidates.

Operators are user-overloadable (`+ - * / % == != < <= > >=`) because an
operator is just a named overload set. The standard built-ins (`+` on Num/Text,
the comparisons, `print`/`eprint` over Num/Text/Bool) are now VISIBLE overloads
routed through the same mechanism, not compiler special-cases.

Concrete deliverable: Text comparison overloads — `==`/`!=` (equality) and
`<`/`<=`/`>`/`>=` (lexicographic) over Text, via a new `__text_cmp` runtime
intrinsic. `<`/`>` are disambiguated from `< >` block delimiters by a lexer
rule: a `>` is the block close only when it is the last token on its line;
otherwise it is the greater-than operator (so `a > b` works everywhere).

Pipeline:
- Lexer: `>` line-final reclassification to a new `Gt` token.
- Parser: operator-symbol-named definitions; `<`/`Gt` as comparison operators;
  binary-op loops stop at a top-level operator definition (`op =`).
- Typechecker: overload-set registry keyed by name (functions AND operator
  symbols); built-in operator/print overloads; exact-type dispatch; new
  NoMatchingOverload / AmbiguousOverload / OverloadMissingAnnotation errors.
- Codegen: each overload member mangled to a distinct symbol; calls/operators
  lowered to the resolved member; Text comparisons lowered via `__text_cmp`;
  records GC-allocated so a record returned from a function/operator survives;
  Bool `==`/`!=` lowered as integer compares; match payload bindings carry their
  declared type so an overloaded call on a concrete sum payload dispatches right.

Ships `examples/overloading.ql` (wired into the examples gate, JIT + native AOT,
exit 161), unit + integration + negative tests, and LANGUAGE.md updates
(overloading + operator-overloading sections, feature matrix, `>` rule, the
non-numeric-Result-payload-through-overload limitation cross-ref).

Deferred (separate PR): concrete sum-payload typing — a non-numeric `Result`
payload (`Ok("x")`) routed through an overload is part of the existing
non-numeric-payload limitation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Share `is_operator_symbol` and `FunctionDecl::is_inert_io_placeholder` from
`ast/nodes.rs` instead of duplicating them verbatim in the type checker and code
generator, so the two passes can never disagree on which names are operators or
which `print`/`eprint` placeholder to skip. Also build overload-resolution error
`candidates` lazily (only when an error is actually raised). No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`quilon build` linked the runtime staticlib with a plain `-lquilon_rt`, which
only pulls archive members that resolve an already-undefined symbol, in a single
pass whose order depends on the archive layout. The Rust staticlib splits the
`#[unsafe(no_mangle)]` intrinsics across codegen-unit objects, so depending on
the (unspecified) CU split a referenced intrinsic could sit in a member the pass
never pulls — surfacing as `undefined reference to __text_cmp` in CI's AOT link
while the JIT (which maps symbols directly) worked. Local builds happened to
co-locate the text intrinsics in one object, hiding it.

Wrap `-lquilon_rt` in `-Wl,--whole-archive`/`--no-whole-archive` so every runtime
object is included deterministically, regardless of CU split or archive order
(GNU ld syntax, honored by both clang and gcc). Verified `examples/overloading.ql`
builds and runs to 161 natively under BOTH linkers, including with a forced
256-codegen-unit split of quilon-rt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…son overloads

AOT link fix (cont.): pin every `#[no_mangle]` runtime intrinsic with a `#[used]`
reachability table in quilon-rt, so the staticlib link can never dead-strip a
symbol that is only ever called from generated LLVM IR (never from Rust) — the
root of CI's `undefined reference to __text_cmp` during native linking. Combined
with the `--whole-archive` wrap of `-lquilon_rt`, the intrinsics are guaranteed
both present in the archive and pulled into the executable, regardless of
codegen-unit layout, archive order, or linker GC.

Also (user-confirmed language rule): a comparison/equality operator overload
(`== != < <= > >=`) must return `Bool` — these are predicates feeding `?`/`|`
matching and conditionals; a non-Bool return is a clear compile error
(ComparisonOverloadNotBool). Arithmetic operators stay unconstrained (`Vec * Num
-> Vec`, dot-product `Vec * Vec -> Num`, etc.). Tests (negative + positive +
arithmetic-unconstrained) and a LANGUAGE.md note added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roof)

Root cause of CI's `undefined reference to __text_cmp` (JIT fine, AOT broken):
neither `cargo test` nor `cargo build --all-targets` emits the `staticlib`
artifact (nothing in those target sets consumes it as a staticlib), so the
`libquilon_rt.a` in `target/` is whatever a prior build left — and CI's
`actions/cache` restores a STALE copy from before `__text_cmp` existed. The
program references the intrinsic; the cached archive doesn't define it.

`ensure_runtime_lib` only rebuilt the `.a` when ABSENT, and a plain
`cargo build -p quilon-rt` won't re-emit it when the crate fingerprint is already
fresh (even if the on-disk `.a` is stale/missing). So build `quilon-rt` into a
DEDICATED, cache-free `--target-dir` (forcing a fresh staticlib emit every run)
and copy that `.a` next to the `quilon` binary. Verified by injecting a corrupted
stale `.a` and watching the native-AOT parity gate regenerate it and pass under
both linkers.

(Keeps the prior belt-and-suspenders: `--whole-archive` around `-lquilon_rt` and
the `#[used]` intrinsic-retention table — both correct, but the stale staticlib
was the actual culprit.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@assapir assapir merged commit d5a52dc into main Jun 27, 2026
2 checks passed
@assapir assapir deleted the worktree-agent-aae1581907489b1c9 branch June 27, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant