M2: Explicit ad-hoc overloading (operators + Text comparison)#32
Merged
Conversation
Adds explicit ad-hoc overloading as the language's only polymorphism (no
generics): multiple same-named top-level definitions with full parameter type
annotations form an overload set, resolved at each call site by EXACT static
argument types (no implicit coercion); ambiguity / no-match are clear
file:line:col diagnostics listing the candidates.
Operators are user-overloadable (`+ - * / % == != < <= > >=`) because an
operator is just a named overload set. The standard built-ins (`+` on Num/Text,
the comparisons, `print`/`eprint` over Num/Text/Bool) are now VISIBLE overloads
routed through the same mechanism, not compiler special-cases.
Concrete deliverable: Text comparison overloads — `==`/`!=` (equality) and
`<`/`<=`/`>`/`>=` (lexicographic) over Text, via a new `__text_cmp` runtime
intrinsic. `<`/`>` are disambiguated from `< >` block delimiters by a lexer
rule: a `>` is the block close only when it is the last token on its line;
otherwise it is the greater-than operator (so `a > b` works everywhere).
Pipeline:
- Lexer: `>` line-final reclassification to a new `Gt` token.
- Parser: operator-symbol-named definitions; `<`/`Gt` as comparison operators;
binary-op loops stop at a top-level operator definition (`op =`).
- Typechecker: overload-set registry keyed by name (functions AND operator
symbols); built-in operator/print overloads; exact-type dispatch; new
NoMatchingOverload / AmbiguousOverload / OverloadMissingAnnotation errors.
- Codegen: each overload member mangled to a distinct symbol; calls/operators
lowered to the resolved member; Text comparisons lowered via `__text_cmp`;
records GC-allocated so a record returned from a function/operator survives;
Bool `==`/`!=` lowered as integer compares; match payload bindings carry their
declared type so an overloaded call on a concrete sum payload dispatches right.
Ships `examples/overloading.ql` (wired into the examples gate, JIT + native AOT,
exit 161), unit + integration + negative tests, and LANGUAGE.md updates
(overloading + operator-overloading sections, feature matrix, `>` rule, the
non-numeric-Result-payload-through-overload limitation cross-ref).
Deferred (separate PR): concrete sum-payload typing — a non-numeric `Result`
payload (`Ok("x")`) routed through an overload is part of the existing
non-numeric-payload limitation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Share `is_operator_symbol` and `FunctionDecl::is_inert_io_placeholder` from `ast/nodes.rs` instead of duplicating them verbatim in the type checker and code generator, so the two passes can never disagree on which names are operators or which `print`/`eprint` placeholder to skip. Also build overload-resolution error `candidates` lazily (only when an error is actually raised). No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`quilon build` linked the runtime staticlib with a plain `-lquilon_rt`, which only pulls archive members that resolve an already-undefined symbol, in a single pass whose order depends on the archive layout. The Rust staticlib splits the `#[unsafe(no_mangle)]` intrinsics across codegen-unit objects, so depending on the (unspecified) CU split a referenced intrinsic could sit in a member the pass never pulls — surfacing as `undefined reference to __text_cmp` in CI's AOT link while the JIT (which maps symbols directly) worked. Local builds happened to co-locate the text intrinsics in one object, hiding it. Wrap `-lquilon_rt` in `-Wl,--whole-archive`/`--no-whole-archive` so every runtime object is included deterministically, regardless of CU split or archive order (GNU ld syntax, honored by both clang and gcc). Verified `examples/overloading.ql` builds and runs to 161 natively under BOTH linkers, including with a forced 256-codegen-unit split of quilon-rt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…son overloads AOT link fix (cont.): pin every `#[no_mangle]` runtime intrinsic with a `#[used]` reachability table in quilon-rt, so the staticlib link can never dead-strip a symbol that is only ever called from generated LLVM IR (never from Rust) — the root of CI's `undefined reference to __text_cmp` during native linking. Combined with the `--whole-archive` wrap of `-lquilon_rt`, the intrinsics are guaranteed both present in the archive and pulled into the executable, regardless of codegen-unit layout, archive order, or linker GC. Also (user-confirmed language rule): a comparison/equality operator overload (`== != < <= > >=`) must return `Bool` — these are predicates feeding `?`/`|` matching and conditionals; a non-Bool return is a clear compile error (ComparisonOverloadNotBool). Arithmetic operators stay unconstrained (`Vec * Num -> Vec`, dot-product `Vec * Vec -> Num`, etc.). Tests (negative + positive + arithmetic-unconstrained) and a LANGUAGE.md note added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roof) Root cause of CI's `undefined reference to __text_cmp` (JIT fine, AOT broken): neither `cargo test` nor `cargo build --all-targets` emits the `staticlib` artifact (nothing in those target sets consumes it as a staticlib), so the `libquilon_rt.a` in `target/` is whatever a prior build left — and CI's `actions/cache` restores a STALE copy from before `__text_cmp` existed. The program references the intrinsic; the cached archive doesn't define it. `ensure_runtime_lib` only rebuilt the `.a` when ABSENT, and a plain `cargo build -p quilon-rt` won't re-emit it when the crate fingerprint is already fresh (even if the on-disk `.a` is stale/missing). So build `quilon-rt` into a DEDICATED, cache-free `--target-dir` (forcing a fresh staticlib emit every run) and copy that `.a` next to the `quilon` binary. Verified by injecting a corrupted stale `.a` and watching the native-AOT parity gate regenerate it and pass under both linkers. (Keeps the prior belt-and-suspenders: `--whole-archive` around `-lquilon_rt` and the `#[used]` intrinsic-retention table — both correct, but the stale staticlib was the actual culprit.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds explicit ad-hoc overloading as Quilon's only polymorphism (no generics): multiple same-named top-level definitions, each with full parameter type annotations, form an overload set resolved at every call site by exact static argument types (no implicit coercion). Ambiguity / no-match are clear
file:line:coldiagnostics that list the candidates.+ - * / % == != < <= > >=) — an operator is just a named overload set. The standard built-ins (+on Num/Text, the comparisons,print/eprintover Num/Text/Bool) are now visible overloads routed through the same mechanism, not compiler special cases (behavior preserved; all prior tests pass).==/!=(equality) and</<=/>/>=(lexicographic) overText, via a new__text_cmpruntime intrinsic — typecheck + codegen + tests, end-to-end (JIT + native AOT).</>vs< >blocks: a>is the block close only when it is the last token on its line; otherwise it is the greater-than operator, soa > bworks everywhere.Ok/NotOkover every built-in payload (Num/Text/Bool/$) construct and tag-dispatch.Pipeline
>reclassification → newGttoken.</Gtas comparison operators; binary-op loops stop at a top-level operator definition (op =).printoverloads; exact-type dispatch;NoMatchingOverload/AmbiguousOverload/OverloadMissingAnnotationerrors.__text_cmp; records GC-allocated so a record returned from a function/operator survives its frame;Bool==/!=lowered as integer compares; match payload bindings carry their declared type so an overloaded call on a concrete sum payload dispatches by that type.Example (mandatory, wired into the gate)
examples/overloading.ql— a user function overload set + a user operator overload (==on a record) + Text comparison;^exit code 161. Listed intests/examples_test.rs(compiles + runs under JIT and native AOT, both linkers) and referenced fromLANGUAGE.md.Tests
Unit + integration + negative coverage: overload resolution by type, operator overload on a user type,
Text==/</>ordering,Bool ==,Ok($)/Ok(Text)/Ok(Num)dispatch, a user sum's concrete (Num and Bool) payload dispatching an overload, and negative cases (ambiguous / no-match / missing annotation → compile error; the>footguns → parse error).Review fixes included
Found via
/code-review: codegen now reads each non-overloaded function's declared return type so a call result feeding an overloaded operator mangles correctly (BUG1); the parser no longer absorbs a following operator definition into the previous expression body (BUG2); loop-variable and sum-payload bindings carry their concrete type for overloaded dispatch (BUG4 + the Bool-payload miscompile)./simplify: sharedis_operator_symbol+FunctionDecl::is_inert_io_placeholderin the AST crate; lazy error-candidate construction.Deferred (separate PR)
Concrete sum-payload typing: a non-numeric
Resultpayload (Ok("x")) routed through an overload is part of the existing non-numeric-payload limitation (documented inLANGUAGE.md). AResultpayload resolves deterministically asNumfor overloads (numeric payloads work end-to-end). User sum types' concrete payloads dispatch correctly.🤖 Generated with Claude Code