M3: Text-in-composite codegen (type-oracle side-table)#35
Merged
Conversation
Codegen recovered LLVM types from runtime BasicValueEnum::get_type() at
every READ site and hardcoded f64, corrupting any non-f64 value nested in
a composite: Text in a record/array, nested arrays, and Ok(text)/NotOk(text).
Fix (the start of M4 "authoritative types in codegen"): the type checker now
stashes a per-expression type side-table (the "type oracle", HashMap<Span,
Type>, returned from check_program). Codegen consults it at the three read
sites instead of assuming f64:
- array index GEP/load uses the index expression's recorded element type
- record field access/GEP rebuilds the real struct type from the record's
declared field types (record ABI unchanged: var alloca holds a pointer to
the struct)
- match-result alloca/load uses the match's recorded result type
A value_repr_type() helper maps a declared Type to its in-composite value
representation (arrays -> {ptr,i64} struct inline; records -> pointer;
unresolved Generic -> f64), kept distinct from type_to_llvm()'s by-reference
lowering. The checker also specializes a constructed Result variant's generic
payload to the concrete arg type (so Ok("x") carries Text) and prefers the
concrete arm type as a match result; types_compatible() keeps differently-
specialized values of the same sum type compatible.
Also fixes two latent bugs surfaced in review:
- named-type constructor fields are reordered to declaration order before
lowering, so out-of-order construction can't mis-GEP a field slot
- a match whose result type came from a never-constructed generic arm no
longer hard-errors in codegen
Ships examples/composites.ql (Text record field + array of Text + nested
array, exit 12), wired into the examples gate (JIT + native AOT clang & gcc),
plus tests/composite_text_test.rs. Flips the Text-in-composite / sum-payload
rows in LANGUAGE.md to done.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # LANGUAGE.md # src/codegen/generator.rs # src/typechecker/checker.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Codegen recovered LLVM types from runtime
BasicValueEnum::get_type()at every READ site and hardcodedf64. That lost the element/field/payload type and corrupted any non-f64value nested in a composite:Textinside a record or array, nested arrays, andOk(text)/NotOk(text). Construction was already correct (generate_array/generate_recordbuild real struct types); the bugs were all at reads.This is the FOUNDATION of M3 and the start of the M4 "authoritative types in codegen" direction: the type checker now produces a side-table of inferred types, and codegen consults it at read sites instead of guessing
f64.The type-oracle side-table API (for downstream M3 waves)
typechecker::TypeTable=HashMap<Span, Type>— every expression's inferred type, keyed by its source span.infer_exprrecords each node's result type as a side effect;TypeChecker::check_programnow returns the table on success (Result<TypeTable, TypeError>).codegen::TypeOraclewraps the table with a single primitive:expr_type(expr) -> Option<&Type>. The checker records the result type of every node, so:arr[i]=expr_type(<the Index node>)rec.field=expr_type(<the FieldAccess node>)match=expr_type(<the Match node>)i.e. a read site asks for the type of the whole node it's lowering — no per-shape accessors. Codegen installs the oracle via
CodeGenerator::with_oracle(ctx, name, program)(orset_type_tableif you already hold a table); without it the oracle is empty and read sites fall back tof64(this is what the IR-only codegen tests rely on).value_repr_type(&Type)maps a declared type to its in-composite value representation (whatgenerate_exprmaterializes and stores inline). It diverges fromtype_to_llvmfor:Array→{ptr,i64}struct (inline value form),Record/Named→ pointer (record ABI),Generic→f64(unresolved payload fallback).Limitation (tracked):
Spanis a byte range with no module identity, and<<imports lex each module independently, so spans can collide across modules. Today's imported modules are numeric helpers/intrinsics with no composite reads, so this is latent; the robust fix is a stable per-node id. Documented in code + the oracle doc-comment.Record ABI (unchanged, documented for consumers)
A record/named-type instance is stored as a pointer to its struct: the variable's alloca holds a pointer, and the struct lives behind it. Field reads/writes load that pointer then
GEPthe field. The struct's field types are the declared field types mapped throughvalue_repr_type, in declaration order —record_field_pointerrebuilds exactly that, and constructor calls are reordered to declaration order before lowering so out-of-order construction can't mis-GEP a slot.Verification
examples/composites.ql— a record with aTextfield + an array ofText+ a nested array, exercised together; deterministic exit 12. Wired intotests/examples_test.rs, which runs it under JIT and native AOT (clang AND gcc).tests/composite_text_test.rs— 12 end-to-end run tests (Text record field, mixed fields, array of Text, nested arrays,Ok(text)/NotOk(text), user sum-type Text payloads, plus two regression tests).Green gate
cargo build— cleancargo test— 321 passed, 0 failed (incl. the native-AOT examples gate: JIT + clang + gcc all agree)cargo fmt --check— cleancargo clippy --all-targets -- -D warnings— cleanReviewed with
/code-reviewand/simplify; findings addressed (consolidated the oracle to one primitive, removed a double field-type lookup, madespecialize_variantmutate in place, madecheck_matchprefer the concrete arm type, fixed the two latent bugs above).🤖 Generated with Claude Code