Skip to content

M3: Text-in-composite codegen (type-oracle side-table)#35

Merged
assapir merged 2 commits into
mainfrom
m3-text-composite-oracle
Jun 27, 2026
Merged

M3: Text-in-composite codegen (type-oracle side-table)#35
assapir merged 2 commits into
mainfrom
m3-text-composite-oracle

Conversation

@assapir

@assapir assapir commented Jun 27, 2026

Copy link
Copy Markdown
Owner

What & why

Codegen recovered LLVM types from runtime BasicValueEnum::get_type() at every READ site and hardcoded f64. That lost the element/field/payload type and corrupted any non-f64 value nested in a composite: Text inside a record or array, nested arrays, and Ok(text)/NotOk(text). Construction was already correct (generate_array/generate_record build real struct types); the bugs were all at reads.

This is the FOUNDATION of M3 and the start of the M4 "authoritative types in codegen" direction: the type checker now produces a side-table of inferred types, and codegen consults it at read sites instead of guessing f64.

The type-oracle side-table API (for downstream M3 waves)

  • typechecker::TypeTable = HashMap<Span, Type> — every expression's inferred type, keyed by its source span. infer_expr records each node's result type as a side effect; TypeChecker::check_program now returns the table on success (Result<TypeTable, TypeError>).

  • codegen::TypeOracle wraps the table with a single primitive: expr_type(expr) -> Option<&Type>. The checker records the result type of every node, so:

    • element type of arr[i] = expr_type(<the Index node>)
    • type of rec.field = expr_type(<the FieldAccess node>)
    • result of a match = expr_type(<the Match node>)

    i.e. a read site asks for the type of the whole node it's lowering — no per-shape accessors. Codegen installs the oracle via CodeGenerator::with_oracle(ctx, name, program) (or set_type_table if you already hold a table); without it the oracle is empty and read sites fall back to f64 (this is what the IR-only codegen tests rely on).

  • value_repr_type(&Type) maps a declared type to its in-composite value representation (what generate_expr materializes and stores inline). It diverges from type_to_llvm for: Array{ptr,i64} struct (inline value form), Record/Named → pointer (record ABI), Genericf64 (unresolved payload fallback).

Limitation (tracked): Span is a byte range with no module identity, and << imports lex each module independently, so spans can collide across modules. Today's imported modules are numeric helpers/intrinsics with no composite reads, so this is latent; the robust fix is a stable per-node id. Documented in code + the oracle doc-comment.

Record ABI (unchanged, documented for consumers)

A record/named-type instance is stored as a pointer to its struct: the variable's alloca holds a pointer, and the struct lives behind it. Field reads/writes load that pointer then GEP the field. The struct's field types are the declared field types mapped through value_repr_type, in declaration orderrecord_field_pointer rebuilds exactly that, and constructor calls are reordered to declaration order before lowering so out-of-order construction can't mis-GEP a slot.

Verification

  • examples/composites.ql — a record with a Text field + an array of Text + a nested array, exercised together; deterministic exit 12. Wired into tests/examples_test.rs, which runs it under JIT and native AOT (clang AND gcc).
  • tests/composite_text_test.rs — 12 end-to-end run tests (Text record field, mixed fields, array of Text, nested arrays, Ok(text)/NotOk(text), user sum-type Text payloads, plus two regression tests).
  • LANGUAGE.md: flipped the Text-in-composite / Text-sum-payload rows to ✅; removed the matching "Known limitation".

Green gate

  • cargo build — clean
  • cargo test321 passed, 0 failed (incl. the native-AOT examples gate: JIT + clang + gcc all agree)
  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean

Reviewed with /code-review and /simplify; findings addressed (consolidated the oracle to one primitive, removed a double field-type lookup, made specialize_variant mutate in place, made check_match prefer the concrete arm type, fixed the two latent bugs above).

Note: branched from current main (436cd50); the brief's d5a52dc was stale.

🤖 Generated with Claude Code

assapir and others added 2 commits June 27, 2026 17:35
Codegen recovered LLVM types from runtime BasicValueEnum::get_type() at
every READ site and hardcoded f64, corrupting any non-f64 value nested in
a composite: Text in a record/array, nested arrays, and Ok(text)/NotOk(text).

Fix (the start of M4 "authoritative types in codegen"): the type checker now
stashes a per-expression type side-table (the "type oracle", HashMap<Span,
Type>, returned from check_program). Codegen consults it at the three read
sites instead of assuming f64:
- array index GEP/load uses the index expression's recorded element type
- record field access/GEP rebuilds the real struct type from the record's
  declared field types (record ABI unchanged: var alloca holds a pointer to
  the struct)
- match-result alloca/load uses the match's recorded result type

A value_repr_type() helper maps a declared Type to its in-composite value
representation (arrays -> {ptr,i64} struct inline; records -> pointer;
unresolved Generic -> f64), kept distinct from type_to_llvm()'s by-reference
lowering. The checker also specializes a constructed Result variant's generic
payload to the concrete arg type (so Ok("x") carries Text) and prefers the
concrete arm type as a match result; types_compatible() keeps differently-
specialized values of the same sum type compatible.

Also fixes two latent bugs surfaced in review:
- named-type constructor fields are reordered to declaration order before
  lowering, so out-of-order construction can't mis-GEP a field slot
- a match whose result type came from a never-constructed generic arm no
  longer hard-errors in codegen

Ships examples/composites.ql (Text record field + array of Text + nested
array, exit 12), wired into the examples gate (JIT + native AOT clang & gcc),
plus tests/composite_text_test.rs. Flips the Text-in-composite / sum-payload
rows in LANGUAGE.md to done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	LANGUAGE.md
#	src/codegen/generator.rs
#	src/typechecker/checker.rs
@assapir assapir merged commit d2ef3e1 into main Jun 27, 2026
2 checks passed
@assapir assapir deleted the m3-text-composite-oracle branch June 27, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant