From 6138ba201873db1122d1b38aef06bd683d6cc3db Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 20 May 2026 19:40:52 +0200 Subject: [PATCH 01/33] docs: defer WASM target after Cranelift spike outcome MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A 2026-05-20 spike confirmed Cranelift 0.131 has no WASM emission backend — `isa::lookup("wasm32-wasip1")` returns `Unsupported`. The `cranelift-wasm` crate is a WASM consumer (used by wasmtime to JIT-execute WASM), not a producer. The "Cranelift-emits-WASM" assumption that drove the earlier plan was wrong. Reframes wasm_target.md as a deferred proposal under docs/dev/proposals/ with a realistic revival path (parallel TIR→WASM backend via bytecodealliance's wasm-encoder crate, ~4–6 weeks). Drops the attempted M8d milestone from the roadmap and its design spec. Fixes the false "Cranelift supports WebAssembly" claim across the spec, docs/index.md, and the landing page. Updates three cross-referencing dev docs for the new path and to correct prior factual claims. --- docs/dev/arc_optimizer.md | 2 +- docs/dev/concurrency_loom_kt.md | 6 +- docs/dev/memory_model_comparison.md | 6 +- docs/dev/proposals/wasm_target.md | 140 +++++++++++++++++ docs/dev/wasm_target.md | 223 ---------------------------- docs/index.md | 2 +- docs/specification.md | 2 +- landing/index.html | 9 -- 8 files changed, 149 insertions(+), 241 deletions(-) create mode 100644 docs/dev/proposals/wasm_target.md delete mode 100644 docs/dev/wasm_target.md diff --git a/docs/dev/arc_optimizer.md b/docs/dev/arc_optimizer.md index e779377..be6cd61 100644 --- a/docs/dev/arc_optimizer.md +++ b/docs/dev/arc_optimizer.md @@ -74,7 +74,7 @@ The ARC optimizer runs after the ownership pass for two reasons: A reasonable implementation strategy is to **share the dataflow framework** between the two passes — same lattice scaffolding, different transfer functions. The ARC optimizer is then "compose witness intervals over the same TIR" rather than a separate pass with its own dataflow library. -**Target-agnostic.** The pass operates on TIR before backend lowering and is unaffected by the choice of native (Cranelift → object) vs WASM (Cranelift → `wasm-ld`). On WASM, the elided retains/releases are WASM atomic ops when the `+atomics` feature is enabled, and ordinary loads/stores otherwise — the pass doesn't need to care which. See [wasm_target.md](wasm_target.md) for backend-specific concerns. +**Target-agnostic.** The pass operates on TIR before backend lowering, so it is unaffected by the choice of native (Cranelift → object) vs any future WASM backend. If a WASM emission path is ever built (see [proposals/wasm_target.md](proposals/wasm_target.md) — currently deferred), the elided retains/releases would be WASM atomic ops when the `+atomics` feature is enabled and ordinary loads/stores otherwise; the pass would not need to care which. ## The pass, sketched diff --git a/docs/dev/concurrency_loom_kt.md b/docs/dev/concurrency_loom_kt.md index 7f5c052..b5396a2 100644 --- a/docs/dev/concurrency_loom_kt.md +++ b/docs/dev/concurrency_loom_kt.md @@ -13,7 +13,7 @@ This doc is structured as a **delta against [`concurrency.md`](concurrency.md)** This proposal does **not** change the user-facing spec. Ryo remains colorless. No `suspend` keyword. One new user-facing primitive (`with_pool`, borrowed from Kotlin's `withContext`). -> **Sibling reference docs:** [`memory_model_comparison.md`](memory_model_comparison.md), [`rust_reference.md`](rust_reference.md), [`mojo_reference.md`](mojo_reference.md), [`arc_optimizer.md`](arc_optimizer.md), [`wasm_target.md`](wasm_target.md). +> **Sibling reference docs:** [`memory_model_comparison.md`](memory_model_comparison.md), [`rust_reference.md`](rust_reference.md), [`mojo_reference.md`](mojo_reference.md), [`arc_optimizer.md`](arc_optimizer.md), [`proposals/wasm_target.md`](proposals/wasm_target.md). --- @@ -368,7 +368,7 @@ This is a meaningful detail. The proposal is **not a rewrite of the concurrency 7. **WASM target: corosensei or stack-switching proposal?** - corosensei supports WASM (via Emscripten / wasm32 targets, with limitations) but the story is rougher than native. - The WebAssembly stack-switching proposal, when stable, gives true Loom semantics on WASM at no implementation cost to Ryo. - - **Recommendation:** ship WASM concurrency on corosensei initially (matches native), then migrate to stack-switching proposal when it stabilizes. Wrapped in [wasm_target.md](wasm_target.md)'s feature-flagged WASM path. + - **Recommendation:** ship WASM concurrency on corosensei initially (matches native), then migrate to stack-switching proposal when it stabilizes. Subject to the WASM target itself being revived — see [proposals/wasm_target.md](proposals/wasm_target.md). --- @@ -389,7 +389,7 @@ Until then, [concurrency.md](concurrency.md) stays as the committed plan. This d - Plan being compared against: [concurrency.md](concurrency.md) (Go-style M:N green threads via `corosensei`). - Spec: [`docs/specification.md`](../specification.md) Section 9 (Concurrency). -- Sibling design docs: [`memory_model_comparison.md`](memory_model_comparison.md), [`rust_reference.md`](rust_reference.md), [`mojo_reference.md`](mojo_reference.md), [`arc_optimizer.md`](arc_optimizer.md), [`wasm_target.md`](wasm_target.md). +- Sibling design docs: [`memory_model_comparison.md`](memory_model_comparison.md), [`rust_reference.md`](rust_reference.md), [`mojo_reference.md`](mojo_reference.md), [`arc_optimizer.md`](arc_optimizer.md), [`proposals/wasm_target.md`](proposals/wasm_target.md). - Upstream prior art: - [JEP 444: Virtual Threads (Java 21 GA)](https://openjdk.org/jeps/444) — Loom (inspiration for the FFI ergonomics goal). - [Loom OpenJDK wiki](https://wiki.openjdk.org/display/loom/Main). diff --git a/docs/dev/memory_model_comparison.md b/docs/dev/memory_model_comparison.md index 05a2c8e..5fb66ba 100644 --- a/docs/dev/memory_model_comparison.md +++ b/docs/dev/memory_model_comparison.md @@ -127,8 +127,8 @@ Companion to [`rust_reference.md`](rust_reference.md), [`mojo_reference.md`](moj | **Ownership / borrow check** | `rustc_borrowck` (NLL + Polonius) on MIR | `CheckLifetime` on MLIR | Exclusivity + ARC check on SIL | `OwnershipPass` on TIR | | **ARC elision** | LLVM `ObjCArcOpts` (limited) | (limited) | SIL `ARCOptimizer` (aggressive) | `arc_optimizer.rs` (planned) | | **Native backend** | LLVM | LLVM | LLVM | Cranelift | -| **WASM backend** | LLVM (`wasm32-*`) | Not yet supported | LLVM (`swiftwasm` fork) | Cranelift (`wasm32-wasip1`, post-Phase-D) | -| **WASM runtime model** | Rust std on `wasm32-wasi` uses wasi-libc | N/A | Swift runtime ported (large) | Go-style: bundled `dlmalloc-rs` + direct WASI imports, no wasi-libc — see [wasm_target.md](wasm_target.md) | +| **WASM backend** | LLVM (`wasm32-*`) | Not yet supported | LLVM (`swiftwasm` fork) | None today — Cranelift does not emit WASM. Future work would build a parallel `wasm-encoder` backend — see [proposals/wasm_target.md](proposals/wasm_target.md). | +| **WASM runtime model** | Rust std on `wasm32-wasi` uses wasi-libc | N/A | Swift runtime ported (large) | TBD — earlier "Go-style bundled `dlmalloc-rs` + direct WASI imports" plan was tied to the dropped Cranelift-emits-WASM design; the runtime model will be revisited if/when the second backend is funded. | ## Philosophical positioning @@ -167,6 +167,6 @@ Ryo deliberately picks: Mojo's argument conventions, Mojo's ASAP destruction, Sw - Dev: `docs/dev/arc_optimizer.md` (Swift-style refcount elision design) - Dev: `docs/dev/borrow_checker.md` (Ryo's algorithm sketch) - Dev: `docs/dev/concurrency.md` (Ryo's concurrency model) -- Dev: `docs/dev/wasm_target.md` (WASM compilation, Go-style runtime, L2 bundled allocator) +- Dev: `docs/dev/proposals/wasm_target.md` (WASM target — deferred proposal) - Milestone: `docs/dev/implementation_roadmap.md` - Upstream: , , diff --git a/docs/dev/proposals/wasm_target.md b/docs/dev/proposals/wasm_target.md new file mode 100644 index 0000000..6cd65bc --- /dev/null +++ b/docs/dev/proposals/wasm_target.md @@ -0,0 +1,140 @@ +# WASM Target — Proposal (Deferred) + +**Status:** Proposal — Deferred post-v0.1.0 (revival requires a strategic commitment to a second backend) + +This document describes the design space for a WebAssembly compilation target. As of 2026-05-20, the spike (see [m8d-cranelift-wasm-spike.md](../../analysis/m8d-cranelift-wasm-spike.md)) showed that **Cranelift does not emit WASM**, so the original "thin plumbing" plan collapsed. Reviving WASM requires building a second backend. This proposal sketches what that would entail and what remains valid. + +## Changelog + +- **v0.4 (2026-05-20):** Reframed as a deferred proposal. The 2026-05-20 spike confirmed that Cranelift 0.131 has no WASM emission backend — its ISAs are `x86`/`arm64`/`s390x`/`riscv64` plus the Pulley interpreter. `cranelift-wasm` is the *consumer* crate (translates WASM → Cranelift IR for execution by wasmtime/wasmer), not a producer. The "Cranelift-emits-WASM" architecture in v0.2/v0.3 was built on a false premise. New direction: if/when WASM becomes a priority, build a parallel TIR→WASM backend using bytecodealliance's `wasm-encoder` crate (the actual emission layer underneath wasm-bindgen and walrus). +- **v0.3 (2026-05-12):** Switched runtime model from "lean on wasi-libc" (L1) to "bundled Rust allocator + direct WASI imports" (L2). Aligned with M8.1's `ryo_rt` runtime crate. *(Now superseded — the wasm-encoder direction reshapes the runtime story anyway.)* +- **v0.2 (initial):** Cranelift WASM backend assumed. *Replaced in v0.4.* + +--- + +## Why this is deferred + +The 2026-05-20 spike (see [m8d-cranelift-wasm-spike.md](../../analysis/m8d-cranelift-wasm-spike.md)) established that: + +- `cranelift_codegen::isa::lookup(Triple::from_str("wasm32-wasip1"))` returns `LookupError::Unsupported`. +- Cranelift 0.131's available ISAs are `x86`, `arm64`, `s390x`, `riscv64`, and `pulley` (a portable interpreter). There is no wasm32 backend, with or without feature flags. +- `cranelift-wasm` (last published 0.112.3, no longer co-released with main Cranelift) is described as "Translator from WebAssembly to Cranelift IR" — input direction, not output. +- In the Cranelift ecosystem, Cranelift is a WASM **consumer** (used by wasmtime/wasmer to JIT-execute WASM modules), never a WASM **producer**. + +The original "WASM Target Plumbing" milestone was sold as "1 week of plumbing" on top of an existing Cranelift WASM backend. With no such backend, every option for shipping WASM requires meaningful new code. None of them are 1-week scope. + +--- + +## The real cost: a second backend + +If WASM is revived, the realistic path is **Option α — `wasm-encoder` parallel codegen**: + +- [`wasm-encoder`](https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wasm-encoder) (bytecodealliance) is the production-grade low-level WASM module builder. It is the emission layer underneath wasm-bindgen, walrus, and most production WASM toolchains. Mature, idiomatic Rust, well-shaped API (function/section/data builders, encode-to-bytes finalization). +- Ryo would gain a second backend (e.g. `src/codegen_wasm.rs`) parallel to the existing Cranelift backend in [codegen.rs](../../../src/codegen.rs). Both backends consume the same TIR; the dispatch happens at codegen entry. + +### Architectural implications + +- **Stack machine vs SSA.** Cranelift IR is SSA: `let v = ins().iadd(a, b)`. WASM is a structured stack machine: `local.get a; local.get b; i32.add`. The two backends are different mental models for codegen, not different syntaxes for the same thing. TIR→WASM lowering is its own discipline. +- **No relocatable-object → linker stage.** `wasm-encoder` produces a complete module directly. The current [linker.rs](../../../src/linker.rs) abstraction (`zig cc` linking native objects) doesn't apply on WASM — the pipeline branches earlier into "emit module bytes; done." +- **No JIT story on WASM.** Acceptable; embedding wasmtime to JIT-execute WASM is a separable later decision. +- **Static data layout differs.** String literals currently lower to `.rodata` via Cranelift's `DataDescription`. On WASM they live in a memory data segment, addressed via `wasm-encoder`'s `MemorySection` / `DataSection` builders. Parallel infrastructure. +- **WASI imports replace libc calls.** `print()` on native lowers to a `write()` libc call; on WASM it imports `wasi_snapshot_preview1::fd_write` and writes through a wasm-encoder-declared import. + +### Scope estimate + +- **Initial backend** for the current language surface (M8c complete: int, bool, float, arithmetic, comparisons, if/else, while, for-range, function calls, returns, read-only strings, top-level `print` via WASI `fd_write`): ~800–1500 LOC. Comparable to the current [codegen.rs](../../../src/codegen.rs). +- **Per future language feature:** roughly 1.5× codegen work going forward — every new `TirTag` needs handling in both backends. Heap allocation (M8.1), fat pointers, ownership intrinsics, structs (M9), enums (M11) — all parallel implementations. +- **CI:** new job that builds programs for WASM, executes under `wasmtime`, asserts behavioral parity with the native backend. New dep on `wasmtime` in the CI environment. +- **Testing:** cross-backend behavioral tests for every language feature. The existing integration suite effectively runs twice (once per backend), with a thin parity-checking wrapper. + +### Timeline + +At the project's current ~8 hr/week pace, a realistic estimate is **4–6 weeks** for initial parity on the existing language surface, plus ongoing parallel work as new features land. This is a strategic commitment, not a tactical pull-forward. + +--- + +## What revival would look like + +If the team decides WASM is worth a second backend: + +1. **Confirm the strategic priority.** WASM is on the long-term vision ([spec §1](../../specification.md)), but post-spike it costs roughly the same as building a second compiler backend. That's a real choice, not a free win. Don't revive WASM as a side quest. +2. **New milestone, not a phase.** Replace any references to "M8d Phase A plumbing" with a properly-sized milestone (e.g. "WASM Backend"). Slot it after the language surface is stable (probably post-M27, "Core Language Complete"). +3. **Backend abstraction first.** Before writing TIR→WASM, factor the existing TIR→Cranelift backend behind a `Backend` trait so dispatch is explicit. This is the genuine plumbing work — and unlike the original M8d, it is both useful (forces the contract between TIR and codegen to be named) and small (~few hundred LOC). +4. **Spike `wasm-encoder` for real.** Emit a minimal `.wasm` module via `wasm-encoder` (constant function, no imports), validate with `wasm-tools validate`, run under `wasmtime`. Confirm the output is genuinely loadable. ~1 day. +5. **Then design the parallel codegen.** TIR→WASM design doc, scope, phasing, CI integration. That is what the *next* version of this proposal looks like. + +### Alternatives considered + +- **β — `walrus`.** Higher-level transform library, not really a primary codegen target. Wrong tool for emitting from scratch. +- **γ — Switch to LLVM.** Mature WASM backend, but a backend-replacement decision dwarfs the WASM target by itself. Defer unless a separate decision motivates an LLVM move. +- **δ — Stay deferred.** Current state. Zero engineering. Honest about current capability. Revisit when (a) Cranelift adds WASM emission upstream, or (b) the team explicitly accepts the cost of a second backend. + +--- + +## What stays valid + +The following sections describe target tradeoffs that hold under any emission backend. Preserved from earlier versions because they are still useful when revival is on the table. + +### Goals (if pursued) + +1. `ryo build --target wasm32-wasip1 ` produces a `.wasm` module that runs under `wasmtime`, `wasmer`, and Node.js (via WASI shim). +2. All synchronous Ryo programs through Milestone 27 (Core Language Complete) compile and run on WASM with identical observable behavior to native targets. +3. The implementation does not regress native AOT or JIT pipelines. + +### Non-Goals + +- Browser target (`wasm32-unknown-unknown` with JS host). Needs a separate binding strategy. +- WASI preview 2 / component model. Wait for stabilization. +- Concurrency, threads, async I/O. Blocked on WasmFX + WASI 0.3 — see [concurrency.md](../concurrency.md). +- JIT execution of WASM (`ryo run --target wasm…`). AOT only. +- Source-level debugging (DWARF-in-WASM). Out of scope. + +### Type System Adjustments + +| Type | Native | WASM | Action | +|---|---|---|---| +| `int` (default) | i64 | i64 | unchanged — WASM has native i64 | +| `uint` | u64 | u64 | unchanged | +| `f32` / `f64` | as-is | as-is | unchanged | +| `i8` / `u8` / `i16` / `u16` | native | stored as i32, narrowed at boundaries | `wasm-encoder` handles narrowing explicitly | +| `i128` / `u128` | software | software (slower) | acceptable; document | +| pointer-sized | 64-bit | 32-bit | a `ptr_type` distinct from Ryo `int` must be threaded through TIR; see the spec-required `int = i64` rule in [specification.md §4.2](../../specification.md) and the audit discussion in [implementation_roadmap.md](../implementation_roadmap.md) M8.1+ | +| `str` / slice fat pointers | `(ptr: i64, len: i64, cap: i64)` | `(ptr: i32, len: i32, cap: i32)` | layout depends on the `ptr_type`-vs-`int_type` split | + +The pointer-width audit motivation outlives the WASM target — it applies to any 32-bit future target (embedded RISC-V, microcontrollers, eventual WASM). If desired, it can land as its own small refactor on its own merits, independent of this proposal. + +### Features That Will Not Work on WASM + +#### Hard Blocks (no path forward without new WASM proposals) + +| Feature | Reason | Unblocks when | +|---|---|---| +| `task.run`, `future[T]`, `.await` | Requires stack switching; standard WASM has no accessible execution stack | WasmFX (Phase 4) reaches stable runtimes | +| Channels (`chan[T]`) | Built on the task runtime | Same as above | +| `task.delay` / timers | Need scheduler suspension points | Same as above | +| Real OS threads / parallelism | Needs `wasi-threads` or `SharedArrayBuffer` | Host-dependent | +| `Mutex` / `RwLock` with blocking | Single-threaded WASM cannot block usefully | Threaded WASM | +| Stack-overflow recovery as a Ryo error | WASM has no guard pages; host traps the module | WasmFX or host trap handlers | + +#### Soft Blocks (degraded but possible) + +| Feature | Status on WASM | +|---|---| +| Networking (`std.net`) | Not in WASI preview 1. Available in preview 2 (`wasi:sockets`); revisit when the component model stabilizes. | +| Subprocess / `std.process.spawn` | Not in WASI. Will return `Unsupported` at runtime. | +| Filesystem | Works, but limited to **preopened** directories (WASI sandbox). No raw `/`-rooted access. | +| Environment variables | Read-only, host-controlled. | +| Panics with stack traces | No DWARF; only message + `proc_exit`. Backtraces deferred. | +| `unsafe` raw pointers | Pointers are 32-bit indices into linear memory; FFI to native libraries is impossible. WASM imports replace C FFI but need a separate binding model. | +| 128-bit ints | Software-emulated, slower. | +| SIMD | Requires the WASM SIMD proposal; emit only when explicitly enabled. | + +--- + +## References + +- Spike outcome (load-bearing): [docs/analysis/m8d-cranelift-wasm-spike.md](../../analysis/m8d-cranelift-wasm-spike.md) +- Spec: [§1](../../specification.md) (Target Domains — mentions Wasm), [§17](../../specification.md) (Tooling), [§19](../../specification.md) (Future Work — WebAssembly Target Details) +- Dev: [concurrency.md](../concurrency.md) — WasmFX future direction for concurrency-on-WASM +- Dev: [arc_optimizer.md](../arc_optimizer.md) — target-agnostic ARC pass remains correct under any backend +- External: [wasm-encoder source](https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wasm-encoder), [WASI preview 1](https://github.com/WebAssembly/WASI/tree/main/legacy/preview1), [WasmFX](https://wasmfx.dev/) diff --git a/docs/dev/wasm_target.md b/docs/dev/wasm_target.md deleted file mode 100644 index 6f38575..0000000 --- a/docs/dev/wasm_target.md +++ /dev/null @@ -1,223 +0,0 @@ -# WASM Target Implementation Plan (Draft — v0.3) - -**Status:** Design (v0.3) - -This document describes the plan for adding a WebAssembly compilation target to the Ryo compiler. It is the actionable counterpart to the long-term WasmFX discussion in [concurrency.md](concurrency.md#future-wasm-target-via-wasmfx). - -The plan is intentionally scoped to what is achievable **today** with Cranelift's stable WASM backend and WASI preview 1, and explicitly defers everything that depends on stack switching, threads, or async I/O. - -## Changelog - -- **v0.3 (2026-05-12):** Switched the runtime model from "lean on wasi-libc" (L1) to **"bundled Rust allocator + direct WASI imports"** (L2). This matches Go's no-external-runtime-dependency model on WASM and is consistent with M8.1's commitment to a Go-style embedded runtime crate (`ryo_rt`). Removes the wasi-libc dependency and shrinks the import surface to 3–6 well-known WASI functions per binary. Reflects the existence of the M8.1 runtime crate, which earlier drafts noted as "not yet existing." -- **v0.2 (initial):** Cranelift WASM backend, wasi-libc allocator, wasm-ld via Zig toolchain. Replaced in v0.3. - ---- - -## Goals - -1. `ryo build --target wasm32-wasip1 ` produces a `.wasm` module that runs under `wasmtime`, `wasmer`, and Node.js (via WASI shim). -2. All synchronous Ryo programs through Milestone 27 (Core Language Complete) compile and run on WASM with identical observable behavior to native targets. -3. The implementation does not regress native AOT or JIT pipelines. -4. The target is gated behind a feature flag until validated, so an incomplete WASM backend cannot break `cargo test` for native users. - -## Non-Goals (current stage) - -- Browser target (`wasm32-unknown-unknown` with JS host). Tracked separately; needs a binding strategy. -- WASI preview 2 / component model. Wait for stabilization. -- Concurrency, threads, async I/O. Blocked on WasmFX + WASI 0.3 — see [concurrency.md](concurrency.md). -- JIT execution of WASM (`ryo run --target wasm…`). AOT only. -- Source-level debugging (DWARF-in-WASM). Out of scope for v0.2. - ---- - -## Current Compiler State (relevant facts) - -- Backend: **Cranelift 0.130** (`src/codegen.rs`, ~625 LOC). IR is largely target-agnostic. -- Linker: `zig cc` driver (`src/linker.rs`, `src/toolchain.rs`). Zig ships `wasm-ld` out of the box. -- Pipeline: lex → parse → AST → semantic → UIR → TIR → **Ownership pass (M8.1+)** → Cranelift IR → object → link (`src/pipeline.rs`). -- **Runtime crate (`ryo_rt`)** lands in M8.1 — see [M8.1 design doc](../superpowers/specs/2026-05-11-milestone-8.1-heap-str-and-move-semantics-design.md). It produces three artifacts (`staticlib` for native, `cdylib` for WASM, `rlib` for compiler-internal JIT) and bundles its own allocator so neither libc nor wasi-libc is a hard dependency at the runtime layer. -- Concurrency runtime (Phase 5, Milestones 32–34) is **not yet implemented**, so there is no green-thread code to port. This is the easiest possible moment to add WASM support. - ---- - -## Architecture - -### Target Selection - -Add a `--target` flag to `ryo build` parsed via `target-lexicon` (already a dependency). Recognized values for v0.2: - -| Triple | Status | -|---|---| -| `` (default) | Existing native AOT path | -| `wasm32-wasip1` | New — primary WASM target | -| `wasm32-unknown-unknown` | Stretch — emits a `.wasm` with no syscalls; useful for pure-compute libraries | - -Internally, plumb a `TargetSpec` struct from `main.rs` → `pipeline.rs` → `codegen.rs` → `linker.rs`. Replace any implicit host-triple usage in `codegen.rs` with the resolved `TargetSpec`. - -### Codegen - -Cranelift already supports `wasm32` as an ISA target. Concretely: - -1. In `codegen.rs`, replace `cranelift_native::builder()` with a triple-aware builder that returns the `wasm32` ISA when targeting WASM. -2. Pointer width becomes 32 bits. Replace any hardcoded `types::I64` for pointer-sized values with `module.target_config().pointer_type()`. Audit: - - struct field offsets in `tir.rs` and `uir.rs` - - any `usize`/`isize` lowering in `sema.rs` / `types.rs` -3. Calling convention: Cranelift emits the standard WASM CC automatically. No ABI work needed for the v0.2 type set. -4. Object output: switch from `cranelift-object` (ELF/Mach-O/COFF) to **`cranelift-object` with the WASM target** — Cranelift produces a relocatable WASM object that `wasm-ld` consumes. Verify this against the current Cranelift version; if WASM object emission lags, fall back to writing a single-module `.wasm` directly via `cranelift-wasm` helpers. - -### Linker - -Replace `zig cc` with `zig wasm-ld` for the WASM target. Critically, **no `-lc` flag** — the `ryo_rt` artifact brings its own allocator and calls WASI imports directly, so there is no need to link wasi-libc. - -```shell -zig wasm-ld -o out.wasm user.o ryo_rt.wasm \ - --no-entry --export=_start -``` - -Add a `Linker::link_wasm` path in `src/linker.rs` parallel to the existing native linker. The Zig toolchain manager (`src/toolchain.rs`) already downloads Zig, so no new toolchain dependency is introduced. - -### Runtime / Stdlib Shims - -The `ryo_rt` runtime crate (introduced in M8.1) does the heavy lifting on every target. For WASM specifically, the crate bundles its own allocator and calls WASI imports directly — the **L2 Go-style model** in [M8.1's runtime section](../superpowers/specs/2026-05-11-milestone-8.1-heap-str-and-move-semantics-design.md). This replaces the earlier "lean on wasi-libc" plan (L1) so that Ryo WASM modules have no system-library dependency beyond the WASI host imports they actually use. - -| Need | Implementation | -|---|---| -| Program entry | Emit `_start` (WASI convention) instead of `main`. Map Ryo's top-level / `main` to `_start`, returning `()` (host gets exit code 0) or an `i32` exit code. | -| `panic` / abort | Lower panics to a call into a tiny runtime wrapper that writes the message via the imported `fd_write` to fd 2 and calls the imported `proc_exit(1)`. | -| Allocator | Runtime bundles `dlmalloc-rs` (or a similar small allocator) as `#[global_allocator]` under `cfg(target_arch = "wasm32")`. All `ryo_str_alloc` / `ryo_str_free` / future container allocations route through it. **No wasi-libc dependency.** | -| stdout / print | When stdlib `print` lands (Milestone 24), route it to the imported `fd_write` directly. The runtime declares the WASI imports with `#[link(wasm_import_module = "wasi_snapshot_preview1")]`. | -| File I/O | Same model — direct WASI imports (`path_open`, `fd_read`, `fd_write`, `fd_close`) declared as `extern "C"` in the runtime, no C shim. | -| Time / random | Direct imports of `clock_time_get` and `random_get`. | -| Atomics (post-M11 `shared[T]`) | Use Rust's `std::sync::atomic`, which compiles to WASM atomics when the `+atomics` feature is enabled. Single-threaded WASM builds use non-atomic loads/stores; semantics are preserved either way. | - -Net: the runtime crate is **the only Rust→WASM artifact** linked into a user binary. The output `.wasm` declares 3–6 host imports (the WASI functions actually used) and no other external dependencies. This is the WASM equivalent of Go's "all you need is `chmod +x` and run." - -### Type System Adjustments - -| Type | Native | WASM | Action | -|---|---|---|---| -| `int` (default) | i64 | i64 | unchanged — WASM has native i64 | -| `uint` | u64 | u64 | unchanged | -| `f32` / `f64` | as-is | as-is | unchanged | -| `i8` / `u8` / `i16` / `u16` | native | stored as i32, narrowed at boundaries | Cranelift handles automatically; verify bool layout | -| `i128` / `u128` | software | software (slower) | acceptable, document | -| pointer-sized | 64-bit | 32-bit | **Audit required** — see Codegen step 2 | -| `str` / slice fat pointers | `(ptr: u64, len: u64, cap: u64)` | `(ptr: u32, len: u32, cap: u32)` | layout already abstracted via `pointer_type()` if step 2 done correctly. The runtime's `RyoStr` uses `usize` so the Rust source compiles to both layouts. See [M8.1 Cranelift Representation](../superpowers/specs/2026-05-11-milestone-8.1-heap-str-and-move-semantics-design.md). | - -### CLI Surface - -``` -ryo build --target wasm32-wasip1 hello.ryo # → hello.wasm -ryo build --target wasm32-wasip1 --release hello.ryo -ryo target list # show available targets -``` - -`ryo run` against a WASM target is **not** implemented in v0.2. Users invoke `wasmtime out.wasm` themselves. A future `ryo run --target wasm…` could shell out to `wasmtime` if it is on `PATH`. - ---- - -## Features That Will Not Work on WASM - -This list MUST be reflected in user-facing docs (`docs/installation.md` or a new `docs/targets.md`) when the target ships. It mirrors and extends the table in [concurrency.md](concurrency.md#what-is-explicitly-out-of-scope). - -### Hard Blocks (no path forward without new WASM proposals) - -| Feature | Reason | Unblocks when | -|---|---|---| -| `task.run`, `future[T]`, `.await` | Requires stack switching; standard WASM has no accessible execution stack | WasmFX (Phase 4) reaches stable runtimes | -| Channels (`chan[T]`) | Built on the task runtime | Same as above | -| `task.delay` / timers | Need scheduler suspension points | Same as above | -| Real OS threads / parallelism | Needs `wasi-threads` or `SharedArrayBuffer` | Host-dependent | -| `Mutex` / `RwLock` with blocking | Single-threaded WASM cannot block usefully | Threaded WASM | -| Stack-overflow recovery as a Ryo error | WASM has no guard pages; host traps the module | WasmFX or host trap handlers | - -### Soft Blocks (degraded but possible) - -| Feature | Status on WASM | -|---|---| -| Networking (`std.net`) | Not in WASI preview 1. Available in preview 2 (`wasi:sockets`); revisit when the component model stabilizes. | -| Subprocess / `std.process.spawn` | Not in WASI. Will return `Unsupported` at runtime. | -| Filesystem | Works, but limited to **preopened** directories (WASI sandbox). No raw `/`-rooted access. | -| Environment variables | Read-only, host-controlled. | -| Panics with stack traces | No DWARF; only message + `proc_exit`. Backtraces deferred. | -| `unsafe` raw pointers | Pointers are 32-bit indices into linear memory; FFI to native libraries is impossible. WASM imports replace C FFI but need a separate binding model (out of scope v0.2). | -| 128-bit ints | Software-emulated, slower. | -| SIMD | Requires the WASM SIMD proposal; emit only when `--features +simd128` is set. v0.2: disabled. | - -### Compile-Time Detection - -Programs that use unavailable features should fail **at compile time** with a clear diagnostic, not at runtime. Mechanism (design only — implementation deferred): - -- Add a `target` predicate usable in `cfg`-style attributes (mirroring Rust's `#[cfg(target_family = "wasm")]`). The exact syntax is a separate spec proposal. -- The stdlib marks symbols like `task.run`, `std.net.*`, `std.process.spawn` with a `#[unavailable(target = "wasm32-*")]` attribute. The semantic analyzer rejects calls to such symbols when the active target matches. -- Until the attribute system exists, document the gaps in `docs/targets.md` and rely on link-time errors (missing WASI imports) as a backstop. - ---- - -## Phased Implementation - -Each phase ends in a green CI run and a demo. Estimates assume the same ~8 hr/week pace as the main roadmap. - -### Phase A — Plumbing (1 week) - -- Add `--target` CLI flag, `TargetSpec` struct, target-aware ISA builder. -- Audit pointer-width assumptions; replace with `pointer_type()`. -- Feature-gate the WASM path behind `cargo build --features wasm` so partial work cannot break native CI. -- **Demo:** `ryo build --target wasm32-wasip1 trivial.ryo` produces a WASM object file (not yet linked). - -### Phase B — Runtime Crate WASM Build + Linking + Hello Exit Code (1–2 weeks) - -- Extend `runtime/Cargo.toml` to add `cdylib` to `crate-type` and conditional `dlmalloc` dep under `cfg(target_arch = "wasm32")`. -- Add the bundled allocator module (`runtime/src/alloc_wasm.rs`) with `#[global_allocator]` set to `dlmalloc::GlobalDlmalloc`. -- Declare WASI imports (`fd_write`, `proc_exit`, `clock_time_get`, `random_get`) as `extern "C"` in the runtime under the same cfg gate. -- Extend the compiler's `build.rs` to invoke `cargo build -p ryo_rt --target wasm32-wasip1 --release` alongside the native build; embed both artifacts via `include_bytes!`. -- Implement `Linker::link_wasm` invoking `wasm-ld` via the Zig toolchain. Pass `ryo_rt.wasm` from the embedded blob, **no `-lc`**. -- Map top-level / `main` to `_start`; the runtime provides the WASI exit wrapper. -- **Demo:** `wasmtime hello.wasm; echo $?` returns the value of a Ryo program that evaluates to an integer (Milestone 3 parity, on WASM). Output `.wasm` declares 1–2 WASI imports and no others. - -### Phase C — Core Language Coverage (1–2 weeks) - -- Run the existing Milestone 4–14 integration tests through the WASM target. Fix any 32-bit pointer or i32/i64 narrowing bugs surfaced. -- Add a `cargo test --features wasm` job that runs the WASM suite under `wasmtime` (CI: install `wasmtime` via release tarball, ~3 MB). -- **Demo:** factorial, fizzbuzz, struct/enum examples from `docs/examples/` all run identically on native and WASM. - -### Phase D — Stdlib Plumbing (concurrent with Milestone 24) - -- When `print`, file I/O, and core stdlib symbols land, ensure they route through the runtime's WASI imports (not wasi-libc). -- Document gaps in `docs/targets.md`. -- **Demo:** `hello_world.ryo` prints "Hello, world!" under `wasmtime --dir=.`. - -### Phase E — Polish - -- `ryo run --target wasm32-wasip1` shells out to `wasmtime` if available. -- `ryo target list` / `ryo target add` (matches the proposal in [proposals.md](proposals.md)). -- Size optimization pass: `--release` invokes `wasm-opt` (from `binaryen`) if present. - -### Out of Phase (deferred) - -- Browser target (needs JS bindings, no `_start`, no WASI). -- Concurrency on WASM — covered by the WasmFX section of [concurrency.md](concurrency.md). -- WASI preview 2 / component model. -- DWARF debug info and source-mapped backtraces. - ---- - -## Risks & Open Questions - -1. **Cranelift WASM object emission maturity.** `cranelift-object` targeting `wasm32` may not be a fully supported configuration in 0.130. Mitigation: spike during Phase A; if blocked, fall back to single-module emission via `cranelift-wasm`. -2. **Pointer-width audit completeness.** Any silently-baked 64-bit assumption in `tir.rs` / `uir.rs` will produce subtly wrong code. The M8.1 runtime uses `usize` / `*mut u8` so the runtime side is safe, but compiler-side codegen needs a focused review. Mitigation: run every existing integration test under the WASM target before declaring Phase C done. -3. **Zig bundled `wasm-ld` invocation surface.** Verify that the Zig version pinned by `src/toolchain.rs` exposes `zig wasm-ld` cleanly. If not, document a `wasm-ld` system dependency and fall back gracefully. -4. **Allocator choice.** `dlmalloc-rs` is well-maintained and matches Rust's own `wasm32-unknown-unknown` default, but it adds ~10 KB to the WASM module. For size-critical use cases (sub-10 KB binaries) consider `lol_alloc` or a bump allocator. Decision deferable until benchmarks exist. -5. **Future concurrency divergence.** When the green-thread runtime lands (M32+), a WASM build must produce a clear "feature not available on this target" error rather than silently linking nothing. Phase A's feature-gating gives us the hook to enforce this. -6. **Atomics availability.** `shared[T]`'s atomic refcount operations require the WASM atomics proposal (`+atomics` feature). For Phase B–D, atomics are unused (single-threaded execution). When `shared[T]` lands (post-M11) the WASM build must explicitly enable atomics; alternatively, single-threaded WASM builds can use non-atomic ops since contention is impossible. - ---- - -## References - -- Spec: §1 (Target Domains, mentions Wasm), §17 (Implementation Strategy — Cranelift WebAssembly support) -- Dev: [concurrency.md](concurrency.md#future-wasm-target-via-wasmfx) (WasmFX deferral), [compilation_pipeline.md](compilation_pipeline.md), [proposals.md](proposals.md) (cross-compilation CLI) -- Dev: [M8.1 design doc](../superpowers/specs/2026-05-11-milestone-8.1-heap-str-and-move-semantics-design.md) (runtime crate, dual-artifact build, pointer-width handling) -- Dev: [arc_optimizer.md](arc_optimizer.md) (target-agnostic refcount elision; runs on TIR before backend lowering) -- Milestone: Slot between Milestones 27 (Core Language Complete) and 28+; adds no language features. To be linked from `implementation_roadmap.md` once approved. -- External: [Cranelift WASM docs](https://github.com/bytecodealliance/wasmtime/tree/main/cranelift), [WASI preview 1 spec](https://github.com/WebAssembly/WASI/tree/main/legacy/preview1), [`dlmalloc-rs`](https://crates.io/crates/dlmalloc), Go's WASI runtime as conceptual prior art. diff --git a/docs/index.md b/docs/index.md index 5f3f932..6a6dcb6 100644 --- a/docs/index.md +++ b/docs/index.md @@ -38,7 +38,7 @@ hide: | **Concurrency** | Green threads + Task/Future/Channel (colorless functions) | | **Types** | Static with bidirectional inference | | **Null Safety** | Optional types (`?T`) with `?.` chaining and `orelse` | -| **Performance** | Native code via Cranelift (AOT, JIT, WebAssembly) | +| **Performance** | Native code via Cranelift (AOT and JIT) | --- diff --git a/docs/specification.md b/docs/specification.md index bd5c909..536e87d 100644 --- a/docs/specification.md +++ b/docs/specification.md @@ -3779,7 +3779,7 @@ Ryo includes a first-class testing framework. * **Linker/Driver:** **Zig (`zig cc`)** is the mandatory linker and driver. * *Rationale:* Enables easy cross-compilation (e.g., `ryo build --target x86_64-linux-musl`) and seamless C interop. -* **Compiler Backend:** **Cranelift**. Supports AOT, JIT, WebAssembly. *(Rationale: Good balance of performance, compile speed, JIT/Wasm support).* +* **Compiler Backend:** **Cranelift**. Supports AOT and JIT. *(Rationale: Good balance of performance and compile speed; provides both AOT and JIT from a single backend.)* A WebAssembly target is a future possibility but is not provided by Cranelift; it would require a parallel backend — see [§19](#19-missing-elements--future-work). * **Tools:** `ryo` package manager integrated, `ryo-bindgen` for automatic C FFI binding generation, `ryo` REPL (using JIT), Integrated Testing (`ryo test`). LSP future goal. ## 17. FFI & `unsafe` diff --git a/landing/index.html b/landing/index.html index 7024a4e..48c183d 100644 --- a/landing/index.html +++ b/landing/index.html @@ -396,15 +396,6 @@

> - From 4088621280aa6e96cd3336a2a68cdca65ff3f42c Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 20 May 2026 20:03:35 +0200 Subject: [PATCH 02/33] docs: align wasm proposal with dev-doc conventions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three review fixes to docs/dev/proposals/wasm_target.md: - Heading uses the `(Draft — v0.4)` marker per docs/dev/CLAUDE.md draft-marking rule (pre-approval content) - Status header uses the canonical enum value `Design (v0.2+)`; the deferral context moves to a sub-header line so the convention- mandated values stay clean - Drop three broken links to `docs/analysis/m8d-cranelift-wasm-spike.md` (the analysis dir is gitignored, so the file isn't in the repo from a fresh-clone perspective). The spike substance is already inlined under "Why this is deferred." - Add the missing `Milestone:` entry to the References footer per the Spec/Dev/Milestone convention - Side fix: correct §17 → §16 for the Tooling section reference --- docs/dev/proposals/wasm_target.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/dev/proposals/wasm_target.md b/docs/dev/proposals/wasm_target.md index 6cd65bc..f5e6034 100644 --- a/docs/dev/proposals/wasm_target.md +++ b/docs/dev/proposals/wasm_target.md @@ -1,8 +1,10 @@ -# WASM Target — Proposal (Deferred) +# WASM Target (Draft — v0.4) -**Status:** Proposal — Deferred post-v0.1.0 (revival requires a strategic commitment to a second backend) +**Status:** Design (v0.2+) -This document describes the design space for a WebAssembly compilation target. As of 2026-05-20, the spike (see [m8d-cranelift-wasm-spike.md](../../analysis/m8d-cranelift-wasm-spike.md)) showed that **Cranelift does not emit WASM**, so the original "thin plumbing" plan collapsed. Reviving WASM requires building a second backend. This proposal sketches what that would entail and what remains valid. +Deferred post-v0.1.0; revival requires a strategic commitment to a second backend (see "Why this is deferred" below). + +This document describes the design space for a WebAssembly compilation target. A 2026-05-20 verification spike showed that **Cranelift does not emit WASM**, so the original "thin plumbing" plan collapsed. Reviving WASM requires building a second backend. This proposal sketches what that would entail and what remains valid. ## Changelog @@ -14,7 +16,7 @@ This document describes the design space for a WebAssembly compilation target. A ## Why this is deferred -The 2026-05-20 spike (see [m8d-cranelift-wasm-spike.md](../../analysis/m8d-cranelift-wasm-spike.md)) established that: +The 2026-05-20 spike established that: - `cranelift_codegen::isa::lookup(Triple::from_str("wasm32-wasip1"))` returns `LookupError::Unsupported`. - Cranelift 0.131's available ISAs are `x86`, `arm64`, `s390x`, `riscv64`, and `pulley` (a portable interpreter). There is no wasm32 backend, with or without feature flags. @@ -133,8 +135,9 @@ The pointer-width audit motivation outlives the WASM target — it applies to an ## References -- Spike outcome (load-bearing): [docs/analysis/m8d-cranelift-wasm-spike.md](../../analysis/m8d-cranelift-wasm-spike.md) -- Spec: [§1](../../specification.md) (Target Domains — mentions Wasm), [§17](../../specification.md) (Tooling), [§19](../../specification.md) (Future Work — WebAssembly Target Details) +- Spec: [§1](../../specification.md) (Target Domains — mentions Wasm), [§16](../../specification.md) (Tooling), [§19](../../specification.md) (Future Work — WebAssembly Target Details) - Dev: [concurrency.md](../concurrency.md) — WasmFX future direction for concurrency-on-WASM - Dev: [arc_optimizer.md](../arc_optimizer.md) — target-agnostic ARC pass remains correct under any backend +- Milestone: None active. Proposal is deferred; revival would slot as a new "WASM Backend" milestone after M27 (Core Language Complete) in [implementation_roadmap.md](../implementation_roadmap.md). +- Spike record: working notes kept locally under `docs/analysis/` (gitignored per [docs/CLAUDE.md](../../CLAUDE.md) — scratch/uncommitted); the substantive findings are inlined above under "Why this is deferred." - External: [wasm-encoder source](https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wasm-encoder), [WASI preview 1](https://github.com/WebAssembly/WASI/tree/main/legacy/preview1), [WasmFX](https://wasmfx.dev/) From 49308878babfa7ac828ee05f99b627ecc7100a87 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 20 May 2026 23:42:11 +0200 Subject: [PATCH 03/33] feat(runtime): workspace skeleton + alloc/free/realloc Adds ryo-runtime crate as a staticlib workspace member. Implements ryo_str_alloc, ryo_str_free, ryo_str_realloc with C ABI. Null/zero-cap free is a no-op per spec. --- Cargo.lock | 4 ++ Cargo.toml | 13 +++++ runtime/Cargo.toml | 8 ++++ runtime/src/lib.rs | 117 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 142 insertions(+) create mode 100644 runtime/Cargo.toml create mode 100644 runtime/src/lib.rs diff --git a/Cargo.lock b/Cargo.lock index 5646765..b944f96 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -1002,6 +1002,10 @@ dependencies = [ "xz2", ] +[[package]] +name = "ryo-runtime" +version = "0.1.0" + [[package]] name = "semver" version = "1.0.28" diff --git a/Cargo.toml b/Cargo.toml index 0d3a475..4d96421 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,3 +1,16 @@ +[workspace] +members = ["runtime"] + +# Without `panic = "abort"`, Rust emits unwinding metadata (`_Unwind_*` +# symbols) that zig cc cannot resolve when linking user binaries. +# These settings live at the workspace root because profile configuration +# isn't permitted in workspace member manifests. +[profile.release] +panic = "abort" + +[profile.dev] +panic = "abort" + [package] name = "ryo" version = "0.1.0" diff --git a/runtime/Cargo.toml b/runtime/Cargo.toml new file mode 100644 index 0000000..7b03f25 --- /dev/null +++ b/runtime/Cargo.toml @@ -0,0 +1,8 @@ +[package] +name = "ryo-runtime" +version = "0.1.0" +edition = "2024" + +[lib] +crate-type = ["staticlib", "rlib"] +# rlib is needed for `cargo test` to work (staticlib alone doesn't support test harness) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs new file mode 100644 index 0000000..78fc6c3 --- /dev/null +++ b/runtime/src/lib.rs @@ -0,0 +1,117 @@ +use std::alloc::{Layout, alloc, dealloc, realloc}; + +#[repr(C)] +pub struct RyoStrFat { + pub ptr: *mut u8, + pub len: u64, + pub cap: u64, +} + +#[unsafe(no_mangle)] +pub extern "C" fn ryo_str_alloc(cap: u64) -> *mut u8 { + if cap == 0 { + return std::ptr::null_mut(); + } + let layout = layout_for(cap); + let ptr = unsafe { alloc(layout) }; + if ptr.is_null() { + oom_abort(); + } + ptr +} + +/// # Safety +/// `ptr` must have been returned by `ryo_str_alloc` or `ryo_str_realloc` +/// with the given `cap`, or be null. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_str_free(ptr: *mut u8, cap: u64) { + if ptr.is_null() || cap == 0 { + return; + } + let layout = layout_for(cap); + unsafe { dealloc(ptr, layout) }; +} + +/// # Safety +/// `ptr` must have been returned by `ryo_str_alloc` or `ryo_str_realloc` +/// with the given `old_cap`, or be null. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_str_realloc(ptr: *mut u8, old_cap: u64, new_cap: u64) -> *mut u8 { + if ptr.is_null() || old_cap == 0 { + return ryo_str_alloc(new_cap); + } + if new_cap == 0 { + unsafe { ryo_str_free(ptr, old_cap) }; + return std::ptr::null_mut(); + } + let layout = layout_for(old_cap); + let new_ptr = unsafe { realloc(ptr, layout, new_cap as usize) }; + if new_ptr.is_null() { + oom_abort(); + } + new_ptr +} + +fn layout_for(cap: u64) -> Layout { + Layout::from_size_align(cap as usize, 1).unwrap_or_else(|_| oom_abort()) +} + +fn oom_abort() -> ! { + eprintln!("ryo: out of memory"); + std::process::abort(); +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_alloc_and_free() { + unsafe { + let ptr = ryo_str_alloc(16); + assert!(!ptr.is_null()); + ryo_str_free(ptr, 16); + } + } + + #[test] + fn test_alloc_zero_returns_null() { + let ptr = ryo_str_alloc(0); + assert!(ptr.is_null()); + } + + #[test] + fn test_free_null_is_noop() { + unsafe { ryo_str_free(std::ptr::null_mut(), 0) }; + } + + #[test] + fn test_realloc_grow() { + unsafe { + let ptr = ryo_str_alloc(8); + assert!(!ptr.is_null()); + let ptr2 = ryo_str_realloc(ptr, 8, 32); + assert!(!ptr2.is_null()); + ryo_str_free(ptr2, 32); + } + } + + #[test] + fn test_realloc_from_null() { + unsafe { + let ptr = ryo_str_realloc(std::ptr::null_mut(), 0, 16); + assert!(!ptr.is_null()); + ryo_str_free(ptr, 16); + } + } + + #[test] + fn test_realloc_to_zero() { + unsafe { + let ptr = ryo_str_alloc(16); + assert!(!ptr.is_null()); + let ptr2 = ryo_str_realloc(ptr, 16, 0); + assert!(ptr2.is_null()); + } + } +} From f1bcb1a17e9af54c94ffc81ee52707c7bc19582a Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 20 May 2026 23:48:44 +0200 Subject: [PATCH 04/33] feat(runtime): from_literal, concat, eq Implements ryo_str_from_literal (heap-copies literal bytes), ryo_str_concat (allocates l_len+r_len), ryo_str_eq (byte-wise). All handle empty/null cases per spec contracts. --- runtime/src/lib.rs | 183 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 183 insertions(+) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index 78fc6c3..418965d 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -61,6 +61,84 @@ fn oom_abort() -> ! { std::process::abort(); } +/// # Safety +/// `data` must point to `len` readable bytes. `out` must point to a valid `RyoStrFat`. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_str_from_literal(data: *const u8, len: u64, out: *mut RyoStrFat) { + unsafe { + if len == 0 { + (*out).ptr = std::ptr::null_mut(); + (*out).len = 0; + (*out).cap = 0; + return; + } + let ptr = ryo_str_alloc(len); + if ptr.is_null() { + oom_abort(); + } + core::ptr::copy_nonoverlapping(data, ptr, len as usize); + (*out).ptr = ptr; + (*out).len = len; + (*out).cap = len; + } +} + +/// # Safety +/// `l_ptr` must point to `l_len` readable bytes (or be null if l_len==0). +/// Same for `r_ptr`/`r_len`. `out` must point to a valid `RyoStrFat`. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_str_concat( + l_ptr: *const u8, + l_len: u64, + r_ptr: *const u8, + r_len: u64, + out: *mut RyoStrFat, +) { + unsafe { + let total = l_len + r_len; + if total == 0 { + (*out).ptr = std::ptr::null_mut(); + (*out).len = 0; + (*out).cap = 0; + return; + } + let ptr = ryo_str_alloc(total); + if ptr.is_null() { + oom_abort(); + } + if l_len > 0 { + core::ptr::copy_nonoverlapping(l_ptr, ptr, l_len as usize); + } + if r_len > 0 { + core::ptr::copy_nonoverlapping(r_ptr, ptr.add(l_len as usize), r_len as usize); + } + (*out).ptr = ptr; + (*out).len = total; + (*out).cap = total; + } +} + +/// # Safety +/// `a_ptr` must point to `a_len` readable bytes (or be null/dangling if a_len==0). +/// Same for `b_ptr`/`b_len`. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_str_eq( + a_ptr: *const u8, + a_len: u64, + b_ptr: *const u8, + b_len: u64, +) -> u8 { + if a_len != b_len { + return 0; + } + if a_len == 0 { + return 1; + } + let a_slice = unsafe { core::slice::from_raw_parts(a_ptr, a_len as usize) }; + let b_slice = unsafe { core::slice::from_raw_parts(b_ptr, b_len as usize) }; + if a_slice == b_slice { 1 } else { 0 } +} + #[cfg(test)] mod tests { use super::*; @@ -114,4 +192,109 @@ mod tests { assert!(ptr2.is_null()); } } + + #[test] + fn test_from_literal_nonempty() { + unsafe { + let data = b"hello"; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_from_literal(data.as_ptr(), 5, &mut out); + assert!(!out.ptr.is_null()); + assert_eq!(out.len, 5); + assert_eq!(out.cap, 5); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"hello"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_from_literal_empty() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_from_literal(b"".as_ptr(), 0, &mut out); + assert!(out.ptr.is_null()); + assert_eq!(out.len, 0); + assert_eq!(out.cap, 0); + } + } + + #[test] + fn test_concat_two_strings() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_concat(b"Hello, ".as_ptr(), 7, b"World!".as_ptr(), 6, &mut out); + assert_eq!(out.len, 13); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"Hello, World!"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_concat_empty_left() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_concat(b"".as_ptr(), 0, b"abc".as_ptr(), 3, &mut out); + assert_eq!(out.len, 3); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"abc"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_concat_both_empty() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_concat(std::ptr::null(), 0, std::ptr::null(), 0, &mut out); + assert!(out.ptr.is_null()); + assert_eq!(out.len, 0); + assert_eq!(out.cap, 0); + } + } + + #[test] + fn test_eq_same_content() { + let result = unsafe { ryo_str_eq(b"hello".as_ptr(), 5, b"hello".as_ptr(), 5) }; + assert_eq!(result, 1); + } + + #[test] + fn test_eq_different_content() { + let result = unsafe { ryo_str_eq(b"hello".as_ptr(), 5, b"world".as_ptr(), 5) }; + assert_eq!(result, 0); + } + + #[test] + fn test_eq_both_empty() { + let result = unsafe { ryo_str_eq(std::ptr::null(), 0, std::ptr::null(), 0) }; + assert_eq!(result, 1); + } + + #[test] + fn test_eq_different_lengths() { + let result = unsafe { ryo_str_eq(b"hi".as_ptr(), 2, b"hello".as_ptr(), 5) }; + assert_eq!(result, 0); + } } From 3e50211c091eef6f1f630cc1c36f90b27a63b3be Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 20 May 2026 23:50:35 +0200 Subject: [PATCH 05/33] feat(runtime): int_to_str, float_to_str, bool_to_str formatters No-std implementations using stack buffers. Float uses 6 decimal places with trailing zero trimming. --- runtime/src/lib.rs | 187 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index 418965d..bf9c845 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -139,6 +139,126 @@ pub unsafe extern "C" fn ryo_str_eq( if a_slice == b_slice { 1 } else { 0 } } +/// # Safety +/// `out` must point to a valid `RyoStrFat`. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_int_to_str(value: i64, out: *mut RyoStrFat) { + let mut buf = [0u8; 20]; + let mut n = value; + let negative = n < 0; + if negative { + n = n.wrapping_neg(); + } + let mut pos = buf.len(); + if n == 0 { + pos -= 1; + buf[pos] = b'0'; + } else { + while n > 0 { + pos -= 1; + buf[pos] = b'0' + (n % 10) as u8; + n /= 10; + } + } + if negative { + pos -= 1; + buf[pos] = b'-'; + } + let len = (buf.len() - pos) as u64; + let ptr = ryo_str_alloc(len); + if ptr.is_null() { + oom_abort(); + } + unsafe { + core::ptr::copy_nonoverlapping(buf.as_ptr().add(pos), ptr, len as usize); + (*out).ptr = ptr; + (*out).len = len; + (*out).cap = len; + } +} + +/// # Safety +/// `out` must point to a valid `RyoStrFat`. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { + let mut buf = [0u8; 64]; + let mut pos = 0usize; + + let negative = value < 0.0; + let abs_val = if negative { -value } else { value }; + + if negative { + buf[pos] = b'-'; + pos += 1; + } + + let int_part = abs_val as u64; + let frac_part = ((abs_val - int_part as f64) * 1_000_000.0 + 0.5) as u64; + + // Write integer part + let mut int_buf = [0u8; 20]; + let mut int_pos = int_buf.len(); + if int_part == 0 { + int_pos -= 1; + int_buf[int_pos] = b'0'; + } else { + let mut n = int_part; + while n > 0 { + int_pos -= 1; + int_buf[int_pos] = b'0' + (n % 10) as u8; + n /= 10; + } + } + let int_len = int_buf.len() - int_pos; + buf[pos..pos + int_len].copy_from_slice(&int_buf[int_pos..]); + pos += int_len; + + // Write fractional part (trim trailing zeros) + buf[pos] = b'.'; + pos += 1; + let mut frac_buf = [0u8; 6]; + let mut f = frac_part; + for i in (0..6).rev() { + frac_buf[i] = b'0' + (f % 10) as u8; + f /= 10; + } + let mut frac_len = 6; + while frac_len > 1 && frac_buf[frac_len - 1] == b'0' { + frac_len -= 1; + } + buf[pos..pos + frac_len].copy_from_slice(&frac_buf[..frac_len]); + pos += frac_len; + + let len = pos as u64; + let ptr = ryo_str_alloc(len); + if ptr.is_null() { + oom_abort(); + } + unsafe { + core::ptr::copy_nonoverlapping(buf.as_ptr(), ptr, len as usize); + (*out).ptr = ptr; + (*out).len = len; + (*out).cap = len; + } +} + +/// # Safety +/// `out` must point to a valid `RyoStrFat`. +#[unsafe(no_mangle)] +pub unsafe extern "C" fn ryo_bool_to_str(value: u8, out: *mut RyoStrFat) { + let s: &[u8] = if value != 0 { b"true" } else { b"false" }; + let ptr = ryo_str_alloc(s.len() as u64); + if ptr.is_null() { + oom_abort(); + } + unsafe { + core::ptr::copy_nonoverlapping(s.as_ptr(), ptr, s.len()); + (*out).ptr = ptr; + (*out).len = s.len() as u64; + (*out).cap = s.len() as u64; + } +} + #[cfg(test)] mod tests { use super::*; @@ -297,4 +417,71 @@ mod tests { let result = unsafe { ryo_str_eq(b"hi".as_ptr(), 2, b"hello".as_ptr(), 5) }; assert_eq!(result, 0); } + + #[test] + fn test_int_to_str_positive() { + unsafe { + let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + ryo_int_to_str(42, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"42"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_int_to_str_negative() { + unsafe { + let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + ryo_int_to_str(-123, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"-123"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_int_to_str_zero() { + unsafe { + let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + ryo_int_to_str(0, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"0"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_float_to_str() { + unsafe { + let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + ryo_float_to_str(2.75, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + let s = core::str::from_utf8(slice).unwrap(); + assert!(s.starts_with("2.75"), "got: {}", s); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_bool_to_str_true() { + unsafe { + let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + ryo_bool_to_str(1, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"true"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_bool_to_str_false() { + unsafe { + let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + ryo_bool_to_str(0, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"false"); + ryo_str_free(out.ptr, out.cap); + } + } } From e58882e9a16ebca0d2f9bcd2bfc903166c4cd405 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 20 May 2026 23:56:39 +0200 Subject: [PATCH 06/33] refactor(codegen): introduce ValueRepr enum for multi-value str Migrates inst_values from HashMap to HashMap. All existing paths use the Scalar variant. Prepares for fat-pointer str triple. --- src/codegen.rs | 75 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 69 insertions(+), 6 deletions(-) diff --git a/src/codegen.rs b/src/codegen.rs index a51124c..cf76b63 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -15,7 +15,7 @@ //! instructions (e.g. `IAdd %3, %5` materializes `%3` and `%5` //! first). Cranelift always needs nested values; doing it //! through `TirRef` indexing is the point. -//! 2. The `eval_inst` memoization map (`HashMap`) +//! 2. The `eval_inst` memoization map (`HashMap`) //! so a shared sub-expression isn't re-emitted. TIR today is //! tree-shaped (one parent per inst) so this is purely //! defensive — but it's the right invariant before lazy sema @@ -87,6 +87,29 @@ struct LoopContext { continue_target: Block, } +#[derive(Debug, Clone, Copy)] +enum ValueRepr { + Scalar(Value), + #[allow(dead_code)] + Str { ptr: Value, len: Value, cap: Value }, +} + +impl ValueRepr { + fn expect_scalar(self) -> Value { + match self { + ValueRepr::Scalar(v) => v, + ValueRepr::Str { .. } => panic!("expected Scalar, got Str"), + } + } +} + +#[derive(Debug, Clone, Copy)] +#[allow(dead_code)] +enum LocalVar { + Scalar(Variable), + Str { ptr: Variable, len: Variable, cap: Variable }, +} + /// Per-function emission state. Lives only for the duration of one /// `compile_function` call; reset between functions because /// Cranelift `Variable` ids and the `TirRef → Value` memo are both @@ -102,10 +125,10 @@ struct FunctionContext<'a, M: Module> { tir: &'a Tir, locals: HashMap, func_ids: &'a HashMap, - /// `TirRef → Value` memo. Materializing the same instruction + /// `TirRef → ValueRepr` memo. Materializing the same instruction /// twice in one function would either duplicate side effects /// (calls) or waste Cranelift IR; both are cheap-but-wrong. - inst_values: HashMap, + inst_values: HashMap, loop_stack: Vec, } @@ -751,8 +774,8 @@ impl Codegen { ctx: &mut FunctionContext<'_, M>, r: TirRef, ) -> Result { - if let Some(&v) = ctx.inst_values.get(&r) { - return Ok(v); + if let Some(repr) = ctx.inst_values.get(&r) { + return Ok(repr.expect_scalar()); } let inst = ctx.tir.inst(r); let value = match inst.tag { @@ -931,7 +954,7 @@ impl Codegen { )); } }; - ctx.inst_values.insert(r, value); + ctx.inst_values.insert(r, ValueRepr::Scalar(value)); Ok(value) } @@ -1171,3 +1194,43 @@ fn no_unreachable_in(tirs: &[Tir]) -> bool { } true } + +#[cfg(test)] +mod tests { + use super::*; + use cranelift::codegen::ir::Value as ClifValue; + + #[test] + fn value_repr_scalar_roundtrip() { + let v = ClifValue::from_u32(1); + let repr = ValueRepr::Scalar(v); + assert_eq!(repr.expect_scalar(), v); + } + + #[test] + fn value_repr_str_fields() { + let repr = ValueRepr::Str { + ptr: ClifValue::from_u32(1), + len: ClifValue::from_u32(2), + cap: ClifValue::from_u32(3), + }; + match repr { + ValueRepr::Str { ptr, len, cap } => { + assert_ne!(ptr, len); + assert_ne!(len, cap); + } + _ => panic!("expected Str"), + } + } + + #[test] + #[should_panic(expected = "expected Scalar, got Str")] + fn value_repr_expect_scalar_panics_on_str() { + let repr = ValueRepr::Str { + ptr: ClifValue::from_u32(1), + len: ClifValue::from_u32(2), + cap: ClifValue::from_u32(3), + }; + repr.expect_scalar(); + } +} From 1b4f7596eb2577f70cdc00524c443d055ae7212e Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 00:02:12 +0200 Subject: [PATCH 07/33] refactor(codegen): cranelift_type_for audit + is_str_type guard MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Makes cranelift_type_for(Str) panic. All callers now gated through is_str_type. Str paths marked with todo!() — they become reachable in subsequent tasks. --- src/codegen.rs | 47 ++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/src/codegen.rs b/src/codegen.rs index cf76b63..3686149 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -25,7 +25,7 @@ use crate::ast::CompoundOp; use crate::tir::{Tir, TirData, TirRef, TirTag}; use crate::types::{InternPool, StringId, TypeId, TypeKind}; -use cranelift::codegen::ir::{BlockArg, FuncRef}; +use cranelift::codegen::ir::{ArgumentPurpose, BlockArg, FuncRef}; use cranelift::codegen::isa; use cranelift::codegen::settings::{self, Configurable}; use cranelift::prelude::*; @@ -35,16 +35,25 @@ use cranelift_object::{ObjectBuilder, ObjectModule}; use std::collections::HashMap; use target_lexicon::Triple; +/// Returns `true` if `ty` resolves to `Str` in the pool. +/// +/// Callers use this to gate multi-value (fat-pointer) paths before +/// reaching `cranelift_type_for`, which panics on `Str`. +fn is_str_type(ty: TypeId, pool: &InternPool) -> bool { + matches!(pool.kind(ty), TypeKind::Str) +} + /// Map a TIR type to the corresponding Cranelift IR type. /// /// `Int` uses the target's pointer-sized integer (i64 on 64-bit). /// `Bool` uses I8 (matches Cranelift's `icmp` result width and Rust's bool layout). -/// `Str` is represented as a pointer (pointer-sized integer). +/// `Str` is a fat pointer (ptr, len, cap) — it cannot map to a single type; +/// callers must gate with `is_str_type` before reaching this function. /// `Void` has no Cranelift representation and should not be mapped here. fn cranelift_type_for(ty: TypeId, pool: &InternPool, pointer_ty: types::Type) -> types::Type { match pool.kind(ty) { TypeKind::Int => pointer_ty, - TypeKind::Str => pointer_ty, + TypeKind::Str => panic!("cranelift_type_for: str is multi-value; use is_str_type gate"), TypeKind::Bool => types::I8, TypeKind::Float => types::F64, // Dead code after trap, but Cranelift needs a concrete type for every SSA value @@ -319,8 +328,14 @@ impl Codegen { fn build_signature(&self, tir: &Tir, pool: &InternPool) -> Signature { let mut sig = self.module.make_signature(); for param in &tir.params { - let cl_ty = cranelift_type_for(param.ty, pool, self.int_type); - sig.params.push(AbiParam::new(cl_ty)); + if is_str_type(param.ty, pool) { + sig.params.push(AbiParam::new(self.int_type)); // ptr + sig.params.push(AbiParam::new(types::I64)); // len + sig.params.push(AbiParam::new(types::I64)); // cap + } else { + let cl_ty = cranelift_type_for(param.ty, pool, self.int_type); + sig.params.push(AbiParam::new(cl_ty)); + } } // C-ABI shim for `main`: Ryo's `fn main()` is void, but the // host C runtime (crt0 via zig cc, or our JIT trampoline) @@ -331,8 +346,16 @@ impl Codegen { if is_main { sig.returns.push(AbiParam::new(self.int_type)); } else if tir.return_type != pool.void() { - let cl_ty = cranelift_type_for(tir.return_type, pool, self.int_type); - sig.returns.push(AbiParam::new(cl_ty)); + if is_str_type(tir.return_type, pool) { + // sret: hidden pointer prepended to regular params, no IR-level return. + sig.params.insert( + 0, + AbiParam::special(self.int_type, ArgumentPurpose::StructReturn), + ); + } else { + let cl_ty = cranelift_type_for(tir.return_type, pool, self.int_type); + sig.returns.push(AbiParam::new(cl_ty)); + } } sig } @@ -359,6 +382,12 @@ impl Codegen { let int_type = self.int_type; let mut locals: HashMap = HashMap::new(); + for param in tir.params.iter() { + assert!( + !is_str_type(param.ty, pool), + "str parameters not yet supported (landing in Task 15)" + ); + } for (i, param) in tir.params.iter().enumerate() { let cl_ty = cranelift_type_for(param.ty, pool, int_type); let var = builder.declare_var(cl_ty); @@ -449,6 +478,10 @@ impl Codegen { match inst.tag { TirTag::VarDecl => { let view = ctx.tir.var_decl_view(r); + if is_str_type(inst.ty, ctx.pool) { + // str VarDecl will be handled in Task 6 + todo!("str VarDecl codegen"); + } let val = Self::eval_inst(builder, ctx, view.initializer)?; // The variable's resolved type lives in the VarDecl // inst's `ty` slot directly — no side-table lookup. From 688c3780e2f6b1ae48fc65b7d4e462ebfe76ffc7 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 00:16:54 +0200 Subject: [PATCH 08/33] feat(codegen): StrConst lowering via ryo_str_from_literal Adds build.rs + justfile for two-step runtime-then-compiler build. Introduces emit_str_literal_fat producing ValueRepr::Str triples, eval_inst_str for str-typed materialisation, StrLocals for multi-variable bindings, and declare_runtime_fn helper. --- build.rs | 26 +++++++ justfile | 24 +++++++ src/codegen.rs | 168 +++++++++++++++++++++++++++++++++++++++++++-- src/main.rs | 2 + src/runtime_lib.rs | 20 ++++++ 5 files changed, 235 insertions(+), 5 deletions(-) create mode 100644 justfile create mode 100644 src/runtime_lib.rs diff --git a/build.rs b/build.rs index afd3ccf..397a59e 100644 --- a/build.rs +++ b/build.rs @@ -14,6 +14,32 @@ fn main() { _ => pkg_version, }; println!("cargo:rustc-env=RYO_VERSION={version}"); + + // Runtime archive path — set by `just build` or default location. + let runtime_path = env::var("RYO_RUNTIME_LIB").unwrap_or_else(|_| { + let manifest_dir = env::var("CARGO_MANIFEST_DIR").unwrap(); + let target_dir = + env::var("CARGO_TARGET_DIR").unwrap_or_else(|_| format!("{manifest_dir}/target")); + let path = std::path::PathBuf::from(&target_dir) + .join("release") + .join("libryo_runtime.a"); + if !path.exists() { + panic!( + "\n\nlibryo_runtime.a not found at {}\n\ + Run `just build` or `cargo build -p ryo-runtime --release` first.\n\n", + path.display() + ); + } + path.to_str().unwrap().to_string() + }); + + println!("cargo:rustc-env=RYO_RUNTIME_LIB={runtime_path}"); + println!("cargo:rerun-if-env-changed=RYO_RUNTIME_LIB"); + println!("cargo:rerun-if-changed={runtime_path}"); + + let manifest_dir = env::var("CARGO_MANIFEST_DIR").unwrap(); + let runtime_src = std::path::PathBuf::from(&manifest_dir).join("runtime/src"); + println!("cargo:rerun-if-changed={}", runtime_src.display()); } fn resolve_git_ref() -> Option { diff --git a/justfile b/justfile new file mode 100644 index 0000000..1bc8ef9 --- /dev/null +++ b/justfile @@ -0,0 +1,24 @@ +# Default recipe: build runtime then compiler +build: + cargo build -p ryo-runtime --release + cargo build + +# Release build +build-release: + cargo build -p ryo-runtime --release + cargo build --release + +# Run all tests (builds runtime first) +test: + cargo build -p ryo-runtime --release + cargo test + +# Run clippy on everything +lint: + cargo clippy -p ryo-runtime --all-targets + cargo build -p ryo-runtime --release + cargo clippy --all-targets + +# Format check +fmt: + cargo fmt --check diff --git a/src/codegen.rs b/src/codegen.rs index 3686149..20f0eae 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -100,7 +100,11 @@ struct LoopContext { enum ValueRepr { Scalar(Value), #[allow(dead_code)] - Str { ptr: Value, len: Value, cap: Value }, + Str { + ptr: Value, + len: Value, + cap: Value, + }, } impl ValueRepr { @@ -116,7 +120,17 @@ impl ValueRepr { #[allow(dead_code)] enum LocalVar { Scalar(Variable), - Str { ptr: Variable, len: Variable, cap: Variable }, + Str { + ptr: Variable, + len: Variable, + cap: Variable, + }, +} + +struct StrLocals { + ptr: Variable, + len: Variable, + cap: Variable, } /// Per-function emission state. Lives only for the duration of one @@ -139,6 +153,7 @@ struct FunctionContext<'a, M: Module> { /// (calls) or waste Cranelift IR; both are cheap-but-wrong. inst_values: HashMap, loop_stack: Vec, + str_locals: HashMap, } impl Codegen { @@ -408,6 +423,7 @@ impl Codegen { func_ids, inst_values: HashMap::new(), loop_stack: Vec::new(), + str_locals: HashMap::new(), }; let has_return = Self::emit_body(&mut builder, &mut ctx, &tir.body_stmts())?; @@ -479,8 +495,27 @@ impl Codegen { TirTag::VarDecl => { let view = ctx.tir.var_decl_view(r); if is_str_type(inst.ty, ctx.pool) { - // str VarDecl will be handled in Task 6 - todo!("str VarDecl codegen"); + let repr = Self::eval_inst_str(builder, ctx, view.initializer)?; + match repr { + ValueRepr::Str { ptr, len, cap } => { + let var_ptr = builder.declare_var(ctx.int_type); + let var_len = builder.declare_var(types::I64); + let var_cap = builder.declare_var(types::I64); + builder.def_var(var_ptr, ptr); + builder.def_var(var_len, len); + builder.def_var(var_cap, cap); + ctx.str_locals.insert( + view.name, + StrLocals { + ptr: var_ptr, + len: var_len, + cap: var_cap, + }, + ); + } + _ => unreachable!("str-typed initializer should produce ValueRepr::Str"), + } + return Ok(false); } let val = Self::eval_inst(builder, ctx, view.initializer)?; // The variable's resolved type lives in the VarDecl @@ -825,7 +860,16 @@ impl Codegen { _ => unreachable!("FloatConst must carry TirData::Float"), }, TirTag::StrConst => match inst.data { - TirData::Str(id) => emit_str_literal(builder, ctx, id)?, + TirData::Str(id) => { + // Returns the raw .rodata pointer — used by __ryo_panic + // which takes (ptr, len) scalars. For fat-pointer str + // materialisation, callers use eval_inst_str instead. + let content = ctx.pool.str(id); + let data_id = + store_string(id, content, ctx.module, ctx.data_ctx, ctx.string_data)?; + let data_ref = ctx.module.declare_data_in_func(data_id, builder.func); + builder.ins().global_value(ctx.int_type, data_ref) + } _ => unreachable!("StrConst must carry TirData::Str"), }, TirTag::Var => match inst.data { @@ -991,6 +1035,119 @@ impl Codegen { Ok(value) } + /// Declare an external runtime function by name and return a + /// `FuncRef` usable in the current function being built. + fn declare_runtime_fn( + module: &mut M, + builder: &mut FunctionBuilder, + name: &str, + params: &[types::Type], + returns: &[types::Type], + ) -> Result { + let mut sig = module.make_signature(); + for &p in params { + sig.params.push(AbiParam::new(p)); + } + for &r in returns { + sig.returns.push(AbiParam::new(r)); + } + let func_id = module + .declare_function(name, Linkage::Import, &sig) + .map_err(|e| format!("Failed to declare {}: {}", name, e))?; + Ok(module.declare_func_in_func(func_id, builder.func)) + } + + /// Materialize a str-typed TIR instruction, returning a + /// `ValueRepr::Str` triple. Falls back to scalar `eval_inst` + /// for non-str instructions. + fn eval_inst_str( + builder: &mut FunctionBuilder, + ctx: &mut FunctionContext<'_, M>, + r: TirRef, + ) -> Result { + if let Some(repr) = ctx.inst_values.get(&r) { + return Ok(*repr); + } + let inst = ctx.tir.inst(r); + let repr = match inst.tag { + TirTag::StrConst => { + let id = match inst.data { + TirData::Str(id) => id, + _ => unreachable!(), + }; + Self::emit_str_literal_fat(builder, ctx, id)? + } + TirTag::Var => { + let name = match inst.data { + TirData::Var(name) => name, + _ => unreachable!(), + }; + if let Some(locals) = ctx.str_locals.get(&name) { + ValueRepr::Str { + ptr: builder.use_var(locals.ptr), + len: builder.use_var(locals.len), + cap: builder.use_var(locals.cap), + } + } else { + // Not a str local — fall through to scalar + let val = Self::eval_inst(builder, ctx, r)?; + return Ok(ValueRepr::Scalar(val)); + } + } + _ => { + // Delegate to scalar eval_inst for non-str instructions + let val = Self::eval_inst(builder, ctx, r)?; + return Ok(ValueRepr::Scalar(val)); + } + }; + ctx.inst_values.insert(r, repr); + Ok(repr) + } + + /// Emit a string literal as a fat pointer triple (ptr, len, cap) + /// by calling `ryo_str_from_literal` at runtime. + fn emit_str_literal_fat( + builder: &mut FunctionBuilder, + ctx: &mut FunctionContext<'_, M>, + id: StringId, + ) -> Result { + let content = ctx.pool.str(id); + let data_id = store_string(id, content, ctx.module, ctx.data_ctx, ctx.string_data)?; + let data_ref = ctx.module.declare_data_in_func(data_id, builder.func); + let rodata_ptr = builder.ins().global_value(ctx.int_type, data_ref); + let lit_len = builder.ins().iconst(types::I64, content.len() as i64); + + // Allocate 24-byte stack slot for out parameter (8-byte aligned) + let slot = + builder.create_sized_stack_slot(StackSlotData::new(StackSlotKind::ExplicitSlot, 24, 3)); + let out_ptr = builder.ins().stack_addr(ctx.int_type, slot, 0); + + // Call ryo_str_from_literal(data, len, out) + let from_literal_ref = Self::declare_runtime_fn( + ctx.module, + builder, + "ryo_str_from_literal", + &[ctx.int_type, types::I64, ctx.int_type], + &[], + )?; + builder + .ins() + .call(from_literal_ref, &[rodata_ptr, lit_len, out_ptr]); + + // Load the triple back from the stack slot + let ptr = builder + .ins() + .load(ctx.int_type, MemFlags::trusted(), out_ptr, 0); + let len = builder + .ins() + .load(types::I64, MemFlags::trusted(), out_ptr, 8); + let cap = builder + .ins() + .load(types::I64, MemFlags::trusted(), out_ptr, 16); + + Ok(ValueRepr::Str { ptr, len, cap }) + } + fn emit_call( builder: &mut FunctionBuilder, ctx: &mut FunctionContext<'_, M>, @@ -1173,6 +1330,7 @@ fn declare_write( /// out of the `Codegen` impl so it can be called without juggling /// `&mut self` borrows alongside the `FunctionContext`'s mutable /// references to the same fields. +#[allow(dead_code)] fn emit_str_literal( builder: &mut FunctionBuilder, ctx: &mut FunctionContext<'_, M>, diff --git a/src/main.rs b/src/main.rs index e27c8ee..5e1a413 100644 --- a/src/main.rs +++ b/src/main.rs @@ -12,6 +12,8 @@ mod lexer; mod linker; mod parser; mod pipeline; +#[allow(dead_code)] +mod runtime_lib; mod sema; mod tir; mod toolchain; diff --git a/src/runtime_lib.rs b/src/runtime_lib.rs new file mode 100644 index 0000000..65f27fb --- /dev/null +++ b/src/runtime_lib.rs @@ -0,0 +1,20 @@ +use std::fs; +use std::io; +use std::path::PathBuf; + +const RYO_RUNTIME_LIB: &[u8] = include_bytes!(env!("RYO_RUNTIME_LIB")); + +pub fn extract_runtime_to_temp() -> Result { + let dir = std::env::temp_dir().join(format!("ryo-runtime-{}", std::process::id())); + fs::create_dir_all(&dir)?; + let path = dir.join("libryo_runtime.a"); + fs::write(&path, RYO_RUNTIME_LIB)?; + Ok(path) +} + +pub fn cleanup_runtime_temp(path: &PathBuf) { + let _ = fs::remove_file(path); + if let Some(parent) = path.parent() { + let _ = fs::remove_dir(parent); + } +} From 4214d677abc8673d17d6afc0366284446a40984a Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 00:20:23 +0200 Subject: [PATCH 09/33] feat(linker): link libryo_runtime.a into every binary Extracts embedded runtime archive to a temp file per invocation, passes it to zig cc. Cleanup on completion. --- src/linker.rs | 12 ++++++++++-- src/pipeline.rs | 9 ++++++++- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/src/linker.rs b/src/linker.rs index e1a3548..8a74f6b 100644 --- a/src/linker.rs +++ b/src/linker.rs @@ -1,12 +1,20 @@ use crate::errors::CompilerError; use crate::toolchain; +use std::path::Path; use std::process::Command; -pub(crate) fn link_executable(obj_file: &str, exe_file: &str) -> Result<(), CompilerError> { +pub(crate) fn link_executable( + obj_file: &str, + exe_file: &str, + runtime_lib: &Path, +) -> Result<(), CompilerError> { let zig_path = toolchain::ensure_zig()?; let output = Command::new(&zig_path) - .args(["cc", "-o", exe_file, obj_file]) + .args([ + "cc", "-o", exe_file, obj_file, + runtime_lib.to_str().unwrap_or("libryo_runtime.a"), + ]) .output() .map_err(|e| CompilerError::LinkError(format!("Failed to run zig cc: {e}")))?; diff --git a/src/pipeline.rs b/src/pipeline.rs index b3f27ea..110f692 100644 --- a/src/pipeline.rs +++ b/src/pipeline.rs @@ -7,6 +7,7 @@ use crate::errors::CompilerError; use crate::lexer::{self, Token}; use crate::linker; use crate::parser::program_parser; +use crate::runtime_lib; use crate::sema; use crate::tir::{self, Tir}; use crate::types::InternPool; @@ -457,7 +458,13 @@ pub(crate) fn build_file(file: &Path) -> Result<(), CompilerError> { fs::write(&obj_filename, obj_bytes).map_err(CompilerError::from)?; println!("Generated object file: {}", obj_filename); - linker::link_executable(&obj_filename, &exe_filename)?; + // Extract embedded runtime archive and link + let runtime_path = runtime_lib::extract_runtime_to_temp() + .map_err(|e| CompilerError::LinkError(format!("Failed to extract runtime: {e}")))?; + + linker::link_executable(&obj_filename, &exe_filename, &runtime_path)?; + + runtime_lib::cleanup_runtime_temp(&runtime_path); let _ = fs::remove_file(&obj_filename); println!("Built: {}", exe_filename); From 526359e75adf892e2df67e861d16c82bd8d1ccae Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 00:32:08 +0200 Subject: [PATCH 10/33] feat(codegen): print() accepts heap-allocated str values generate_print_call now uses eval_inst_str to materialize any str-typed argument. Sema relaxed from literal-only to any str expression. Enables print(variable). Also registers ryo_str_from_literal with the JIT builder so runtime symbols resolve in JIT mode. --- Cargo.lock | 1 + Cargo.toml | 1 + runtime/src/lib.rs | 36 ++++++++++++++++++++++----- src/codegen.rs | 51 ++++++++++++++++---------------------- src/linker.rs | 5 +++- src/sema.rs | 26 +++++++++++-------- tests/integration_tests.rs | 24 ++++++++++++++++++ 7 files changed, 96 insertions(+), 48 deletions(-) diff --git a/Cargo.lock b/Cargo.lock index b944f96..afb46dc 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -995,6 +995,7 @@ dependencies = [ "dirs", "hashbrown 0.17.1", "logos", + "ryo-runtime", "tar", "target-lexicon", "tempfile", diff --git a/Cargo.toml b/Cargo.toml index 4d96421..f58f9c7 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -19,6 +19,7 @@ edition = "2024" [build-dependencies] [dependencies] +ryo-runtime = { path = "runtime" } ariadne = "0.6" chumsky = "0.12" diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index bf9c845..20cb63e 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -421,7 +421,11 @@ mod tests { #[test] fn test_int_to_str_positive() { unsafe { - let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; ryo_int_to_str(42, &mut out); let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); assert_eq!(slice, b"42"); @@ -432,7 +436,11 @@ mod tests { #[test] fn test_int_to_str_negative() { unsafe { - let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; ryo_int_to_str(-123, &mut out); let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); assert_eq!(slice, b"-123"); @@ -443,7 +451,11 @@ mod tests { #[test] fn test_int_to_str_zero() { unsafe { - let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; ryo_int_to_str(0, &mut out); let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); assert_eq!(slice, b"0"); @@ -454,7 +466,11 @@ mod tests { #[test] fn test_float_to_str() { unsafe { - let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; ryo_float_to_str(2.75, &mut out); let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); let s = core::str::from_utf8(slice).unwrap(); @@ -466,7 +482,11 @@ mod tests { #[test] fn test_bool_to_str_true() { unsafe { - let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; ryo_bool_to_str(1, &mut out); let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); assert_eq!(slice, b"true"); @@ -477,7 +497,11 @@ mod tests { #[test] fn test_bool_to_str_false() { unsafe { - let mut out = RyoStrFat { ptr: std::ptr::null_mut(), len: 0, cap: 0 }; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; ryo_bool_to_str(0, &mut out); let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); assert_eq!(slice, b"false"); diff --git a/src/codegen.rs b/src/codegen.rs index 20f0eae..a7938e6 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -210,9 +210,18 @@ impl Codegen { impl Codegen { pub fn new_jit() -> Result { - let jit_builder = JITBuilder::new(cranelift_module::default_libcall_names()) + let mut jit_builder = JITBuilder::new(cranelift_module::default_libcall_names()) .map_err(|e| format!("Failed to create JIT builder: {}", e))?; + // Register runtime symbols so the JIT can resolve them. + jit_builder.symbols([ + ( + "ryo_str_from_literal", + ryo_runtime::ryo_str_from_literal as *const u8, + ), + ("ryo_str_alloc", ryo_runtime::ryo_str_alloc as *const u8), + ]); + Ok(Self::from_module( JITModule::new(jit_builder), Triple::host(), @@ -1205,40 +1214,22 @@ impl Codegen { ctx: &mut FunctionContext<'_, M>, args: &[TirRef], ) -> Result<(), String> { - // Sema has already validated arity and the string-literal - // constraint (see `sema::check_builtin_call`). The matches - // below are therefore infallible. debug_assert_eq!(args.len(), 1, "sema should reject print() arity errors"); - let string_id = match ctx.tir.inst(args[0]).data { - TirData::Str(id) => id, - other => unreachable!( - "sema should reject non-literal print() args, got {:?}", - other - ), - }; - let string_content = ctx.pool.str(string_id); - - let data_id = store_string( - string_id, - string_content, - ctx.module, - ctx.data_ctx, - ctx.string_data, - )?; - let data_ref = ctx.module.declare_data_in_func(data_id, builder.func); - let string_ptr = builder.ins().global_value(ctx.int_type, data_ref); + debug_assert!( + is_str_type(ctx.tir.inst(args[0]).ty, ctx.pool), + "sema should reject non-str print() args", + ); - let string_len = builder - .ins() - .iconst(ctx.int_type, string_content.len() as i64); - let fd = builder.ins().iconst(types::I32, 1); + let repr = Self::eval_inst_str(builder, ctx, args[0])?; + let (ptr, len) = match repr { + ValueRepr::Str { ptr, len, .. } => (ptr, len), + _ => unreachable!("str-typed arg produced Scalar"), + }; check_platform_support(ctx.triple)?; - + let fd = builder.ins().iconst(types::I32, 1); let write_ref = declare_write(ctx.module, builder, ctx.int_type)?; - let call_inst = builder.ins().call(write_ref, &[fd, string_ptr, string_len]); - let _bytes_written = builder.inst_results(call_inst)[0]; - + builder.ins().call(write_ref, &[fd, ptr, len]); Ok(()) } } diff --git a/src/linker.rs b/src/linker.rs index 8a74f6b..41d9f45 100644 --- a/src/linker.rs +++ b/src/linker.rs @@ -12,7 +12,10 @@ pub(crate) fn link_executable( let output = Command::new(&zig_path) .args([ - "cc", "-o", exe_file, obj_file, + "cc", + "-o", + exe_file, + obj_file, runtime_lib.to_str().unwrap_or("libryo_runtime.a"), ]) .output() diff --git a/src/sema.rs b/src/sema.rs index c416603..deaf42e 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -1301,7 +1301,7 @@ fn emit_builtin_call( let name = sema.pool.str(view.name); match name { "print" => { - if !check_print_args(sema, view, span) { + if !check_print_args(sema, fcx, view, arg_tirs, span) { return fcx.builder.unreachable(sema.pool.error_type(), span); } let ret_ty = builtin.return_type(sema.pool); @@ -1428,7 +1428,13 @@ fn build_panic_call( .call(panic_name, &[str_ref, len_ref], sema.pool.never(), span) } -fn check_print_args(sema: &mut Sema<'_>, view: &CallView, span: Span) -> bool { +fn check_print_args( + sema: &mut Sema<'_>, + fcx: &FuncCtx, + view: &CallView, + arg_tirs: &[TirRef], + span: Span, +) -> bool { if view.args.len() != 1 { sema.sink.emit(Diag::error( span, @@ -1437,11 +1443,15 @@ fn check_print_args(sema: &mut Sema<'_>, view: &CallView, span: Span) -> bool { )); return false; } - if !matches!(sema.uir.inst(view.args[0]).tag, InstTag::StrLiteral) { + let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if !matches!(sema.pool.kind(arg_ty), TypeKind::Str) { sema.sink.emit(Diag::error( sema.uir.span(view.args[0]), - DiagCode::BuiltinArgKind, - "print() argument must be a string literal", + DiagCode::TypeMismatch, + format!( + "print() argument must be str, got {}", + sema.pool.display(arg_ty) + ), )); return false; } @@ -1750,12 +1760,6 @@ mod tests { assert!(matches!(main.inst(v.initializer).data, TirData::Bool(true))); } - #[test] - fn print_with_non_literal_rejected_in_sema() { - let diags = run("x = \"hi\"\nprint(x)").unwrap_err(); - assert!(any_code(&diags, DiagCode::BuiltinArgKind)); - } - #[test] fn print_arity_rejected_in_sema() { let diags = run("print(\"a\", \"b\")").unwrap_err(); diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index 30da4b4..10370f6 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -1904,3 +1904,27 @@ fn test_for_body_return() { String::from_utf8_lossy(&output.stderr) ); } + +#[test] +fn test_str_variable_print() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "fn main():\n\tname: str = \"Hello\"\n\tprint(name)\n"; + let test_file = create_test_file(temp_dir.path(), "str_var_print.ryo", code); + + let output = run_ryo_command(&["run", "str_var_print.ryo"], &test_file) + .expect("Failed to run ryo command"); + + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("Hello"), + "Output should contain 'Hello', got: {}", + stdout + ); + assert!(stdout.contains("[Result] => 0"), "Should exit with code 0"); +} From e1871f9cd9702d0cc1538f7e9a3c507753fc240d Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 00:38:48 +0200 Subject: [PATCH 11/33] feat: str + str concatenation via ryo_str_concat Adds TirTag::StrConcat. Sema accepts (str, str) for +. Codegen emits out-parameter call to ryo_str_concat and loads the resulting triple. --- src/codegen.rs | 56 ++++++++++++++++++++++++++++++++++++++ src/sema.rs | 15 ++++++++++ src/tir.rs | 5 ++++ tests/integration_tests.rs | 49 +++++++++++++++++++++++++++++++++ 4 files changed, 125 insertions(+) diff --git a/src/codegen.rs b/src/codegen.rs index a7938e6..bc3f23c 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -220,6 +220,7 @@ impl Codegen { ryo_runtime::ryo_str_from_literal as *const u8, ), ("ryo_str_alloc", ryo_runtime::ryo_str_alloc as *const u8), + ("ryo_str_concat", ryo_runtime::ryo_str_concat as *const u8), ]); Ok(Self::from_module( @@ -1027,6 +1028,9 @@ impl Codegen { Self::generate_if_stmt(builder, ctx, r)?; builder.ins().iconst(ctx.int_type, 0) } + TirTag::StrConcat => { + return Err("StrConcat must be materialized through eval_inst_str".to_string()); + } TirTag::Unreachable => { return Err( "codegen reached an Unreachable TIR inst — sema must have errored".to_string(), @@ -1103,6 +1107,58 @@ impl Codegen { return Ok(ValueRepr::Scalar(val)); } } + TirTag::StrConcat => { + let (lhs, rhs) = match inst.data { + TirData::BinOp { lhs, rhs } => (lhs, rhs), + _ => unreachable!(), + }; + let l_repr = Self::eval_inst_str(builder, ctx, lhs)?; + let r_repr = Self::eval_inst_str(builder, ctx, rhs)?; + let (l_ptr, l_len) = match l_repr { + ValueRepr::Str { ptr, len, .. } => (ptr, len), + _ => unreachable!(), + }; + let (r_ptr, r_len) = match r_repr { + ValueRepr::Str { ptr, len, .. } => (ptr, len), + _ => unreachable!(), + }; + + let slot = builder.create_sized_stack_slot(StackSlotData::new( + StackSlotKind::ExplicitSlot, + 24, + 3, + )); + let out_ptr = builder.ins().stack_addr(ctx.int_type, slot, 0); + + let concat_ref = Self::declare_runtime_fn( + ctx.module, + builder, + "ryo_str_concat", + &[ + ctx.int_type, + types::I64, + ctx.int_type, + types::I64, + ctx.int_type, + ], + &[], + )?; + builder + .ins() + .call(concat_ref, &[l_ptr, l_len, r_ptr, r_len, out_ptr]); + + let ptr = builder + .ins() + .load(ctx.int_type, MemFlags::trusted(), out_ptr, 0); + let len = builder + .ins() + .load(types::I64, MemFlags::trusted(), out_ptr, 8); + let cap = builder + .ins() + .load(types::I64, MemFlags::trusted(), out_ptr, 16); + + ValueRepr::Str { ptr, len, cap } + } _ => { // Delegate to scalar eval_inst for non-str instructions let val = Self::eval_inst(builder, ctx, r)?; diff --git a/src/sema.rs b/src/sema.rs index deaf42e..229b523 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -1176,6 +1176,21 @@ fn check_binary_op( fcx.builder .binary(tir_tag, sema.pool.float(), lhs, rhs, span) } + TypeKind::Str => { + if tag != InstTag::Add { + sema.sink.emit(Diag::error( + span, + DiagCode::UnsupportedOperator, + format!( + "arithmetic operator '{}' not supported for type 'str'", + bin_op_symbol(tag), + ), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + fcx.builder + .binary(TirTag::StrConcat, sema.pool.str_(), lhs, rhs, span) + } TypeKind::Error => fcx.builder.unreachable(sema.pool.error_type(), span), _ => { sema.sink.emit(Diag::error( diff --git a/src/tir.rs b/src/tir.rs index 7c2bd8d..14573f1 100644 --- a/src/tir.rs +++ b/src/tir.rs @@ -135,6 +135,9 @@ pub enum TirTag { ICmpGt, ICmpGe, + // String concatenation. + StrConcat, + // Float arithmetic / comparison. FAdd, FSub, @@ -474,6 +477,7 @@ impl TirBuilder { | TirTag::ICmpLe | TirTag::ICmpGt | TirTag::ICmpGe + | TirTag::StrConcat | TirTag::FAdd | TirTag::FSub | TirTag::FMul @@ -1083,6 +1087,7 @@ fn bin_op_name(t: TirTag) -> &'static str { TirTag::FCmpLe => "fcmp_le", TirTag::FCmpGt => "fcmp_gt", TirTag::FCmpGe => "fcmp_ge", + TirTag::StrConcat => "str_concat", TirTag::BoolAnd => "bool_and", TirTag::BoolOr => "bool_or", _ => "?bin", diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index 10370f6..fc82990 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -1928,3 +1928,52 @@ fn test_str_variable_print() { ); assert!(stdout.contains("[Result] => 0"), "Should exit with code 0"); } + +#[test] +fn test_str_concat() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = + "fn main():\n\ta: str = \"Hello, \"\n\tb: str = \"World!\"\n\tc: str = a + b\n\tprint(c)\n"; + let test_file = create_test_file(temp_dir.path(), "str_concat.ryo", code); + + let output = + run_ryo_command(&["run", "str_concat.ryo"], &test_file).expect("Failed to run ryo command"); + + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("Hello, World!"), + "Output should contain 'Hello, World!', got: {}", + stdout + ); + assert!(stdout.contains("[Result] => 0"), "Should exit with code 0"); +} + +#[test] +fn test_str_concat_chained() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "fn main():\n\tresult: str = \"a\" + \"b\" + \"c\"\n\tprint(result)\n"; + let test_file = create_test_file(temp_dir.path(), "str_concat_chained.ryo", code); + + let output = run_ryo_command(&["run", "str_concat_chained.ryo"], &test_file) + .expect("Failed to run ryo command"); + + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("abc"), + "Output should contain 'abc', got: {}", + stdout + ); + assert!(stdout.contains("[Result] => 0"), "Should exit with code 0"); +} From 94573108326a8dac07727d20cb58e0ae10f352b6 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 00:43:58 +0200 Subject: [PATCH 12/33] feat: str == / != equality via ryo_str_eq Adds StrCmpEq/StrCmpNe TIR tags. Sema accepts (str, str) for == and !=. Codegen calls ryo_str_eq, inverts for !=. --- src/codegen.rs | 34 ++++++++++++++++++++++++++++++++++ src/sema.rs | 28 ++++++++++++++++------------ src/tir.rs | 8 ++++++++ tests/integration_tests.rs | 12 ++++++++++++ 4 files changed, 70 insertions(+), 12 deletions(-) diff --git a/src/codegen.rs b/src/codegen.rs index bc3f23c..1dd09d5 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -221,6 +221,7 @@ impl Codegen { ), ("ryo_str_alloc", ryo_runtime::ryo_str_alloc as *const u8), ("ryo_str_concat", ryo_runtime::ryo_str_concat as *const u8), + ("ryo_str_eq", ryo_runtime::ryo_str_eq as *const u8), ]); Ok(Self::from_module( @@ -1028,6 +1029,39 @@ impl Codegen { Self::generate_if_stmt(builder, ctx, r)?; builder.ins().iconst(ctx.int_type, 0) } + TirTag::StrCmpEq | TirTag::StrCmpNe => { + let (lhs, rhs) = match inst.data { + TirData::BinOp { lhs, rhs } => (lhs, rhs), + _ => unreachable!(), + }; + let l_repr = Self::eval_inst_str(builder, ctx, lhs)?; + let r_repr = Self::eval_inst_str(builder, ctx, rhs)?; + let (l_ptr, l_len) = match l_repr { + ValueRepr::Str { ptr, len, .. } => (ptr, len), + _ => unreachable!(), + }; + let (r_ptr, r_len) = match r_repr { + ValueRepr::Str { ptr, len, .. } => (ptr, len), + _ => unreachable!(), + }; + + let eq_ref = Self::declare_runtime_fn( + ctx.module, + builder, + "ryo_str_eq", + &[ctx.int_type, types::I64, ctx.int_type, types::I64], + &[types::I8], + )?; + let call = builder.ins().call(eq_ref, &[l_ptr, l_len, r_ptr, r_len]); + let result = builder.inst_results(call)[0]; + + if inst.tag == TirTag::StrCmpNe { + let one = builder.ins().iconst(types::I8, 1); + builder.ins().bxor(result, one) + } else { + result + } + } TirTag::StrConcat => { return Err("StrConcat must be materialized through eval_inst_str".to_string()); } diff --git a/src/sema.rs b/src/sema.rs index 229b523..80f1974 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -1061,15 +1061,13 @@ fn check_binary_op( } TypeKind::Error => fcx.builder.unreachable(sema.pool.error_type(), span), TypeKind::Str => { - sema.sink.emit(Diag::error( - span, - DiagCode::UnsupportedOperator, - format!( - "equality operator '{}' not supported for type 'str' (yet)", - bin_op_symbol(tag), - ), - )); - fcx.builder.unreachable(sema.pool.error_type(), span) + let tir_tag = match tag { + InstTag::Eq => TirTag::StrCmpEq, + InstTag::NotEq => TirTag::StrCmpNe, + _ => unreachable!(), + }; + fcx.builder + .binary(tir_tag, sema.pool.bool_(), lhs, rhs, span) } TypeKind::Void | TypeKind::Never | TypeKind::Tuple => { sema.sink.emit(Diag::error( @@ -1748,9 +1746,15 @@ mod tests { } #[test] - fn string_equality_rejected() { - let diags = run("x = \"a\" == \"b\"").unwrap_err(); - assert!(any_code(&diags, DiagCode::UnsupportedOperator)); + fn string_equality_accepted() { + let (tirs, _pool) = run("x = \"a\" == \"b\"").unwrap(); + // The equality produces a bool-typed StrCmpEq instruction. + let body = &tirs[0]; + let has_str_eq = body + .instructions + .iter() + .any(|i| i.tag == TirTag::StrCmpEq); + assert!(has_str_eq, "expected StrCmpEq in TIR"); } #[test] diff --git a/src/tir.rs b/src/tir.rs index 14573f1..30d5242 100644 --- a/src/tir.rs +++ b/src/tir.rs @@ -138,6 +138,10 @@ pub enum TirTag { // String concatenation. StrConcat, + // String equality. + StrCmpEq, + StrCmpNe, + // Float arithmetic / comparison. FAdd, FSub, @@ -478,6 +482,8 @@ impl TirBuilder { | TirTag::ICmpGt | TirTag::ICmpGe | TirTag::StrConcat + | TirTag::StrCmpEq + | TirTag::StrCmpNe | TirTag::FAdd | TirTag::FSub | TirTag::FMul @@ -1088,6 +1094,8 @@ fn bin_op_name(t: TirTag) -> &'static str { TirTag::FCmpGt => "fcmp_gt", TirTag::FCmpGe => "fcmp_ge", TirTag::StrConcat => "str_concat", + TirTag::StrCmpEq => "str_eq", + TirTag::StrCmpNe => "str_ne", TirTag::BoolAnd => "bool_and", TirTag::BoolOr => "bool_or", _ => "?bin", diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index fc82990..97012a5 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -1977,3 +1977,15 @@ fn test_str_concat_chained() { ); assert!(stdout.contains("[Result] => 0"), "Should exit with code 0"); } + +#[test] +fn test_str_equality() { + let code = "fn main():\n\ta: str = \"hello\"\n\tb: str = \"hello\"\n\tassert(a == b, \"equal strings should be equal\")\n"; + assert_ryo_runs!("str_equality.ryo", code); +} + +#[test] +fn test_str_inequality() { + let code = "fn main():\n\ta: str = \"hello\"\n\tb: str = \"world\"\n\tassert(a != b, \"different strings should not be equal\")\n"; + assert_ryo_runs!("str_inequality.ryo", code); +} From 7270ef726af0753e8504056a6db7d9f8a985b64d Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 00:52:32 +0200 Subject: [PATCH 13/33] feat: int_to_str, float_to_str, bool_to_str builtins Registers three new builtins returning str. Codegen emits out-parameter calls to the corresponding runtime functions. --- src/builtins.rs | 14 ++++++ src/codegen.rs | 88 ++++++++++++++++++++++++++++++++++++++ src/sema.rs | 86 +++++++++++++++++++++++++++++++++++-- tests/integration_tests.rs | 69 ++++++++++++++++++++++++++++++ 4 files changed, 253 insertions(+), 4 deletions(-) diff --git a/src/builtins.rs b/src/builtins.rs index c81a646..6588d47 100644 --- a/src/builtins.rs +++ b/src/builtins.rs @@ -10,6 +10,7 @@ pub struct BuiltinFunction { enum BuiltinReturn { Void, Never, + Str, } impl BuiltinFunction { @@ -17,6 +18,7 @@ impl BuiltinFunction { match self.return_ty { BuiltinReturn::Void => pool.void(), BuiltinReturn::Never => pool.never(), + BuiltinReturn::Str => pool.str_(), } } } @@ -34,6 +36,18 @@ pub const BUILTINS: &[BuiltinFunction] = &[ name: "panic", return_ty: BuiltinReturn::Never, }, + BuiltinFunction { + name: "int_to_str", + return_ty: BuiltinReturn::Str, + }, + BuiltinFunction { + name: "float_to_str", + return_ty: BuiltinReturn::Str, + }, + BuiltinFunction { + name: "bool_to_str", + return_ty: BuiltinReturn::Str, + }, ]; pub fn lookup(name: &str) -> Option<&'static BuiltinFunction> { diff --git a/src/codegen.rs b/src/codegen.rs index 1dd09d5..6ab5978 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -222,6 +222,12 @@ impl Codegen { ("ryo_str_alloc", ryo_runtime::ryo_str_alloc as *const u8), ("ryo_str_concat", ryo_runtime::ryo_str_concat as *const u8), ("ryo_str_eq", ryo_runtime::ryo_str_eq as *const u8), + ("ryo_int_to_str", ryo_runtime::ryo_int_to_str as *const u8), + ( + "ryo_float_to_str", + ryo_runtime::ryo_float_to_str as *const u8, + ), + ("ryo_bool_to_str", ryo_runtime::ryo_bool_to_str as *const u8), ]); Ok(Self::from_module( @@ -1141,6 +1147,55 @@ impl Codegen { return Ok(ValueRepr::Scalar(val)); } } + TirTag::Call => { + let view = ctx.tir.call_view(r); + let name_str = ctx.pool.str(view.name); + if name_str == "int_to_str" + || name_str == "float_to_str" + || name_str == "bool_to_str" + { + let arg_val = Self::eval_inst(builder, ctx, view.args[0])?; + + let slot = builder.create_sized_stack_slot(StackSlotData::new( + StackSlotKind::ExplicitSlot, + 24, + 3, + )); + let out_ptr = builder.ins().stack_addr(ctx.int_type, slot, 0); + + let (fn_name, param_ty) = match name_str { + "int_to_str" => ("ryo_int_to_str", ctx.int_type), + "float_to_str" => ("ryo_float_to_str", types::F64), + "bool_to_str" => ("ryo_bool_to_str", types::I8), + _ => unreachable!(), + }; + + let func_ref = Self::declare_runtime_fn( + ctx.module, + builder, + fn_name, + &[param_ty, ctx.int_type], + &[], + )?; + builder.ins().call(func_ref, &[arg_val, out_ptr]); + + let ptr = builder + .ins() + .load(ctx.int_type, MemFlags::trusted(), out_ptr, 0); + let len = builder + .ins() + .load(types::I64, MemFlags::trusted(), out_ptr, 8); + let cap = builder + .ins() + .load(types::I64, MemFlags::trusted(), out_ptr, 16); + + ValueRepr::Str { ptr, len, cap } + } else { + // Non-formatter call — delegate to scalar + let val = Self::eval_inst(builder, ctx, r)?; + return Ok(ValueRepr::Scalar(val)); + } + } TirTag::StrConcat => { let (lhs, rhs) = match inst.data { TirData::BinOp { lhs, rhs } => (lhs, rhs), @@ -1263,6 +1318,39 @@ impl Codegen { return Ok(builder.ins().iconst(ctx.int_type, 0)); } + // Formatter builtins — when called as a bare statement (result + // discarded), we still emit the call but throw away the output. + // The primary path is eval_inst_str (used when result is assigned + // to a str variable or passed to print). + if name_str == "int_to_str" || name_str == "float_to_str" || name_str == "bool_to_str" { + let arg_val = Self::eval_inst(builder, ctx, view.args[0])?; + + let slot = builder.create_sized_stack_slot(StackSlotData::new( + StackSlotKind::ExplicitSlot, + 24, + 3, + )); + let out_ptr = builder.ins().stack_addr(ctx.int_type, slot, 0); + + let (fn_name, param_ty) = match name_str { + "int_to_str" => ("ryo_int_to_str", ctx.int_type), + "float_to_str" => ("ryo_float_to_str", types::F64), + "bool_to_str" => ("ryo_bool_to_str", types::I8), + _ => unreachable!(), + }; + + let func_ref = Self::declare_runtime_fn( + ctx.module, + builder, + fn_name, + &[param_ty, ctx.int_type], + &[], + )?; + builder.ins().call(func_ref, &[arg_val, out_ptr]); + + return Ok(builder.ins().iconst(ctx.int_type, 0)); + } + let callee_id = *ctx .func_ids .get(&name_id) diff --git a/src/sema.rs b/src/sema.rs index 80f1974..2bf6acf 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -1322,6 +1322,87 @@ fn emit_builtin_call( } "panic" => emit_panic(sema, fcx, view, span), "assert" => emit_assert(sema, fcx, view, arg_tirs, span), + "int_to_str" => { + if view.args.len() != 1 { + sema.sink.emit(Diag::error( + span, + DiagCode::ArityMismatch, + format!( + "int_to_str() takes exactly 1 argument, got {}", + view.args.len() + ), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if !matches!(sema.pool.kind(arg_ty), TypeKind::Int) { + sema.sink.emit(Diag::error( + sema.uir.span(view.args[0]), + DiagCode::TypeMismatch, + format!( + "int_to_str() argument must be int, got {}", + sema.pool.display(arg_ty) + ), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + let ret_ty = builtin.return_type(sema.pool); + fcx.builder.call(view.name, arg_tirs, ret_ty, span) + } + "float_to_str" => { + if view.args.len() != 1 { + sema.sink.emit(Diag::error( + span, + DiagCode::ArityMismatch, + format!( + "float_to_str() takes exactly 1 argument, got {}", + view.args.len() + ), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if !matches!(sema.pool.kind(arg_ty), TypeKind::Float) { + sema.sink.emit(Diag::error( + sema.uir.span(view.args[0]), + DiagCode::TypeMismatch, + format!( + "float_to_str() argument must be float, got {}", + sema.pool.display(arg_ty) + ), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + let ret_ty = builtin.return_type(sema.pool); + fcx.builder.call(view.name, arg_tirs, ret_ty, span) + } + "bool_to_str" => { + if view.args.len() != 1 { + sema.sink.emit(Diag::error( + span, + DiagCode::ArityMismatch, + format!( + "bool_to_str() takes exactly 1 argument, got {}", + view.args.len() + ), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if !matches!(sema.pool.kind(arg_ty), TypeKind::Bool) { + sema.sink.emit(Diag::error( + sema.uir.span(view.args[0]), + DiagCode::TypeMismatch, + format!( + "bool_to_str() argument must be bool, got {}", + sema.pool.display(arg_ty) + ), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + let ret_ty = builtin.return_type(sema.pool); + fcx.builder.call(view.name, arg_tirs, ret_ty, span) + } _ => { let ret_ty = builtin.return_type(sema.pool); fcx.builder.call(view.name, arg_tirs, ret_ty, span) @@ -1750,10 +1831,7 @@ mod tests { let (tirs, _pool) = run("x = \"a\" == \"b\"").unwrap(); // The equality produces a bool-typed StrCmpEq instruction. let body = &tirs[0]; - let has_str_eq = body - .instructions - .iter() - .any(|i| i.tag == TirTag::StrCmpEq); + let has_str_eq = body.instructions.iter().any(|i| i.tag == TirTag::StrCmpEq); assert!(has_str_eq, "expected StrCmpEq in TIR"); } diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index 97012a5..d5da934 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -1989,3 +1989,72 @@ fn test_str_inequality() { let code = "fn main():\n\ta: str = \"hello\"\n\tb: str = \"world\"\n\tassert(a != b, \"different strings should not be equal\")\n"; assert_ryo_runs!("str_inequality.ryo", code); } + +#[test] +fn test_int_to_str_builtin() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "fn main():\n\ts: str = int_to_str(42)\n\tprint(s)\n"; + let test_file = create_test_file(temp_dir.path(), "int_to_str.ryo", code); + + let output = + run_ryo_command(&["run", "int_to_str.ryo"], &test_file).expect("Failed to run ryo command"); + + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("42"), + "Output should contain '42', got: {}", + stdout + ); +} + +#[test] +fn test_float_to_str_builtin() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "fn main():\n\ts: str = float_to_str(2.75)\n\tprint(s)\n"; + let test_file = create_test_file(temp_dir.path(), "float_to_str.ryo", code); + + let output = run_ryo_command(&["run", "float_to_str.ryo"], &test_file) + .expect("Failed to run ryo command"); + + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("2.75"), + "Output should contain '2.75', got: {}", + stdout + ); +} + +#[test] +fn test_bool_to_str_builtin() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "fn main():\n\ts: str = bool_to_str(true)\n\tprint(s)\n"; + let test_file = create_test_file(temp_dir.path(), "bool_to_str.ryo", code); + + let output = run_ryo_command(&["run", "bool_to_str.ryo"], &test_file) + .expect("Failed to run ryo command"); + + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("true"), + "Output should contain 'true', got: {}", + stdout + ); +} From 99189ff72026bb724da9dfe5d80f7d5df3b6c6dc Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 01:04:47 +0200 Subject: [PATCH 14/33] feat(parser): dot-method-call syntax (expr.name(args)) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds Token::Dot, ExprKind::MethodCall, and postfix parsing loop. First postfix operator — M9/M22 will add field access and indexing arms. --- src/ast.rs | 16 +++++++++++++++ src/astgen.rs | 4 ++++ src/lexer.rs | 5 +++++ src/parser.rs | 55 +++++++++++++++++++++++++++++++++++++++------------ 4 files changed, 67 insertions(+), 13 deletions(-) diff --git a/src/ast.rs b/src/ast.rs index b31027d..f9f2c62 100644 --- a/src/ast.rs +++ b/src/ast.rs @@ -337,6 +337,9 @@ impl Expression { ExprKind::BinaryOp(_, op, _) => format!("BinaryOp({})", op), ExprKind::UnaryOp(op, _) => format!("UnaryOp({})", op), ExprKind::Call(name, _) => format!("Call({})", pool.str(*name)), + ExprKind::MethodCall { method, .. } => { + format!("MethodCall(.{})", pool.str(*method)) + } }; println!( @@ -361,6 +364,14 @@ impl Expression { arg.pretty_print(&format!("{}{}", new_prefix, prefix_char), pool); } } + ExprKind::MethodCall { receiver, args, .. } => { + receiver.pretty_print(&format!("{}├── recv: ", new_prefix), pool); + for (i, arg) in args.iter().enumerate() { + let is_last = i == args.len() - 1; + let prefix_char = if is_last { "└── " } else { "├── " }; + arg.pretty_print(&format!("{}{}", new_prefix, prefix_char), pool); + } + } } } } @@ -372,6 +383,11 @@ pub enum ExprKind { BinaryOp(Box, BinaryOperator, Box), UnaryOp(UnaryOperator, Box), Call(StringId, Vec), + MethodCall { + receiver: Box, + method: StringId, + args: Vec, + }, } #[derive(Debug, Clone, Copy, PartialEq)] diff --git a/src/astgen.rs b/src/astgen.rs index 439f1a1..efa0340 100644 --- a/src/astgen.rs +++ b/src/astgen.rs @@ -386,6 +386,10 @@ fn gen_expr(b: &mut UirBuilder, expr: &ast::Expression) -> InstRef { let arg_refs: Vec = args.iter().map(|a| gen_expr(b, a)).collect(); b.call(*name, &arg_refs, span) } + ast::ExprKind::MethodCall { .. } => { + // TODO(task-13): lower method calls to UIR + todo!("MethodCall lowering lands in Task 13") + } } } diff --git a/src/lexer.rs b/src/lexer.rs index 4934264..c2ea72f 100644 --- a/src/lexer.rs +++ b/src/lexer.rs @@ -90,6 +90,7 @@ pub enum Token { LBrace, RBrace, Comma, + Dot, // Newline + indentation tokens (post-processed by `indent`). Newline, @@ -153,6 +154,7 @@ impl fmt::Display for Token { Self::LBrace => write!(f, "{{"), Self::RBrace => write!(f, "}}"), Self::Comma => write!(f, ","), + Self::Dot => write!(f, "."), Self::Newline => write!(f, ""), Self::Indent => write!(f, ""), Self::Dedent => write!(f, ""), @@ -268,6 +270,8 @@ pub(crate) enum RawToken<'a> { RBrace, #[token(",")] Comma, + #[token(".")] + Dot, #[regex(r"\n[ \t]*")] Newline(&'a str), @@ -458,6 +462,7 @@ fn intern_token(raw: RawToken<'_>, span: Span, pool: &mut InternPool) -> Result< RawToken::LBrace => Token::LBrace, RawToken::RBrace => Token::RBrace, RawToken::Comma => Token::Comma, + RawToken::Dot => Token::Dot, RawToken::Newline(_) => Token::Newline, RawToken::Indent => Token::Indent, diff --git a/src/parser.rs b/src/parser.rs index bb739de..08bb6ec 100644 --- a/src/parser.rs +++ b/src/parser.rs @@ -406,28 +406,57 @@ where let ident_expr = select! { Token::Ident(name) => name } .map_with(|name, e| Expression::new(ExprKind::Ident(name), e.span())); - let parenthesized = expr.delimited_by(just(Token::LParen), just(Token::RParen)); + let parenthesized = expr + .clone() + .delimited_by(just(Token::LParen), just(Token::RParen)); call.or(ident_expr).or(literal).or(parenthesized) }; + let postfix = atom + .foldl( + just(Token::Dot) + .ignore_then(select! { Token::Ident(name) => name }) + .then( + expr.clone() + .separated_by(just(Token::Comma)) + .allow_trailing() + .collect::>() + .delimited_by(just(Token::LParen), just(Token::RParen)), + ) + .map_with(|(method, args), e| (method, args, e.span())) + .repeated(), + |receiver, (method, args, span): (_, _, SimpleSpan)| { + let start = receiver.span.start; + let end = span.end; + Expression::new( + ExprKind::MethodCall { + receiver: Box::new(receiver), + method, + args, + }, + SimpleSpan::new((), start..end), + ) + }, + ) + .boxed(); + let unary_op = choice(( just(Token::Sub).to(UnaryOperator::Neg), just(Token::Not).to(UnaryOperator::Not), )); - let unary = - unary_op - .repeated() - .collect::>() - .then(atom) - .map_with(|(ops, expr), e| { - let mut result = expr; - for op in ops.into_iter().rev() { - result = Expression::new(ExprKind::UnaryOp(op, Box::new(result)), e.span()); - } - result - }); + let unary = unary_op + .repeated() + .collect::>() + .then(postfix) + .map_with(|(ops, expr), e| { + let mut result = expr; + for op in ops.into_iter().rev() { + result = Expression::new(ExprKind::UnaryOp(op, Box::new(result)), e.span()); + } + result + }); let term = unary.clone().foldl( choice(( From 2eb7b1004548efa7de309e473281975855b7b9a8 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 09:05:19 +0200 Subject: [PATCH 15/33] feat: .len() and .is_empty() method calls on str MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds dot-method-call lowering through UIR → Sema → TIR → Codegen. StrLen reads the fat-pointer len field directly (no runtime call). StrIsEmpty compares len to 0. --- src/astgen.rs | 11 +++-- src/codegen.rs | 24 +++++++++++ src/sema.rs | 63 ++++++++++++++++++++++++++- src/tir.rs | 14 ++++++ src/uir.rs | 87 ++++++++++++++++++++++++++++++++++++++ tests/integration_tests.rs | 56 ++++++++++++++++++++++++ 6 files changed, 251 insertions(+), 4 deletions(-) diff --git a/src/astgen.rs b/src/astgen.rs index efa0340..81d2c48 100644 --- a/src/astgen.rs +++ b/src/astgen.rs @@ -386,9 +386,14 @@ fn gen_expr(b: &mut UirBuilder, expr: &ast::Expression) -> InstRef { let arg_refs: Vec = args.iter().map(|a| gen_expr(b, a)).collect(); b.call(*name, &arg_refs, span) } - ast::ExprKind::MethodCall { .. } => { - // TODO(task-13): lower method calls to UIR - todo!("MethodCall lowering lands in Task 13") + ast::ExprKind::MethodCall { + receiver, + method, + args, + } => { + let receiver_ref = gen_expr(b, receiver); + let arg_refs: Vec = args.iter().map(|a| gen_expr(b, a)).collect(); + b.method_call(receiver_ref, *method, &arg_refs, span) } } } diff --git a/src/codegen.rs b/src/codegen.rs index 6ab5978..3310be8 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -1035,6 +1035,30 @@ impl Codegen { Self::generate_if_stmt(builder, ctx, r)?; builder.ins().iconst(ctx.int_type, 0) } + TirTag::StrLen => { + let operand = match inst.data { + TirData::UnOp(r) => r, + _ => unreachable!("StrLen must carry TirData::UnOp"), + }; + let repr = Self::eval_inst_str(builder, ctx, operand)?; + match repr { + ValueRepr::Str { len, .. } => len, + _ => unreachable!("StrLen operand must produce ValueRepr::Str"), + } + } + TirTag::StrIsEmpty => { + let operand = match inst.data { + TirData::UnOp(r) => r, + _ => unreachable!("StrIsEmpty must carry TirData::UnOp"), + }; + let repr = Self::eval_inst_str(builder, ctx, operand)?; + let len_val = match repr { + ValueRepr::Str { len, .. } => len, + _ => unreachable!("StrIsEmpty operand must produce ValueRepr::Str"), + }; + let zero = builder.ins().iconst(types::I64, 0); + builder.ins().icmp(IntCC::Equal, len_val, zero) + } TirTag::StrCmpEq | TirTag::StrCmpNe => { let (lhs, rhs) = match inst.data { TirData::BinOp { lhs, rhs } => (lhs, rhs), diff --git a/src/sema.rs b/src/sema.rs index 2bf6acf..5241344 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -48,7 +48,7 @@ use crate::ast::CompoundOp; use crate::builtins; use crate::diag::{Diag, DiagCode, DiagSink}; -use crate::tir::{Tir, TirBuilder, TirParam, TirRef, TirTag}; +use crate::tir::{Tir, TirBuilder, TirData, TirParam, TirRef, TirTag}; use crate::types::{InternPool, StringId, TypeId, TypeKind}; use crate::uir::{CallView, FuncBody, InstData, InstRef, InstTag, Span, Uir, VarDeclView}; use std::collections::{HashMap, VecDeque}; @@ -965,6 +965,67 @@ fn analyze_expr(sema: &mut Sema<'_>, fcx: &mut FuncCtx, scope: &Scope, r: InstRe } check_call(sema, fcx, &view, &arg_tirs, span) } + InstTag::MethodCall => { + let view = sema.uir.method_call_view(r); + let receiver_tir = analyze_expr(sema, fcx, scope, view.receiver); + let receiver_ty = fcx.builder.ty_of(receiver_tir); + let method_name = sema.pool.str(view.name).to_string(); + + // For now, only str has methods + if sema.pool.kind(receiver_ty) != TypeKind::Str { + if !sema.pool.is_error(receiver_ty) { + sema.sink.emit(Diag::error( + span, + DiagCode::TypeMismatch, + format!("type '{}' has no methods", sema.pool.display(receiver_ty)), + )); + } + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + + match method_name.as_str() { + "len" => { + if !view.args.is_empty() { + sema.sink.emit(Diag::error( + span, + DiagCode::ArityMismatch, + "str.len() takes no arguments".to_string(), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + fcx.builder.push_typed( + TirTag::StrLen, + TirData::UnOp(receiver_tir), + sema.pool.int(), + span, + ) + } + "is_empty" => { + if !view.args.is_empty() { + sema.sink.emit(Diag::error( + span, + DiagCode::ArityMismatch, + "str.is_empty() takes no arguments".to_string(), + )); + return fcx.builder.unreachable(sema.pool.error_type(), span); + } + fcx.builder.push_typed( + TirTag::StrIsEmpty, + TirData::UnOp(receiver_tir), + sema.pool.bool_(), + span, + ) + } + _ => { + sema.sink.emit(Diag::error( + span, + DiagCode::UndefinedFunction, + format!("str has no method '{}'", method_name), + )); + fcx.builder.unreachable(sema.pool.error_type(), span) + } + } + } other => panic!( "analyze_expr: instruction at %{} is not an expression (tag={:?})", r.index(), diff --git a/src/tir.rs b/src/tir.rs index 30d5242..98a8047 100644 --- a/src/tir.rs +++ b/src/tir.rs @@ -142,6 +142,11 @@ pub enum TirTag { StrCmpEq, StrCmpNe, + /// Read the `len` field of a str fat pointer. Operand in `TirData::UnOp`. + StrLen, + /// `str.is_empty()` — compares len to 0. Operand in `TirData::UnOp`. + StrIsEmpty, + // Float arithmetic / comparison. FAdd, FSub, @@ -508,6 +513,13 @@ impl TirBuilder { self.push(TirTag::Unreachable, ty, TirData::None, span) } + /// General-purpose instruction emit for tags that don't fit the + /// `unary` / `binary` debug-assert gates. Sema uses this for + /// method-call lowerings like `StrLen` / `StrIsEmpty`. + pub fn push_typed(&mut self, tag: TirTag, data: TirData, ty: TypeId, span: Span) -> TirRef { + self.push(tag, ty, data, span) + } + fn extra_offset(&self) -> u32 { u32::try_from(self.extra.len()).expect("TIR extra arena exceeded u32::MAX words") } @@ -1108,6 +1120,8 @@ fn un_op_name(t: TirTag) -> &'static str { TirTag::BoolNot => "bool_not", TirTag::Return => "ret", TirTag::ExprStmt => "expr_stmt", + TirTag::StrLen => "str_len", + TirTag::StrIsEmpty => "str_is_empty", _ => "?un", } } diff --git a/src/uir.rs b/src/uir.rs index 869192f..5664eec 100644 --- a/src/uir.rs +++ b/src/uir.rs @@ -200,6 +200,9 @@ pub enum InstTag { /// `continue` statement. Continue, + + /// Method call (e.g. `receiver.name(args)`). Variable payload in `extra` — see [`method_call_extra`]. + MethodCall, // Reserved for the comptime milestone: // ComptimeBlock, Decl. } @@ -378,6 +381,21 @@ pub mod compound_assign_extra { pub const LEN: usize = 3; } +/// Layout in `extra` for [`InstTag::MethodCall`]: +/// +/// ```text +/// [0] receiver: InstRef.raw() +/// [1] name: StringId.raw() +/// [2] argc: u32 +/// [3..3+argc] args: InstRef.raw() +/// ``` +pub mod method_call_extra { + pub const RECEIVER: usize = 0; + pub const NAME: usize = 1; + pub const ARGC: usize = 2; + pub const ARGS: usize = 3; +} + /// Layout in `extra` for [`InstTag::WhileLoop`]: /// /// ```text @@ -710,6 +728,29 @@ impl UirBuilder { pub fn continue_stmt(&mut self, span: Span) -> InstRef { self.push(InstTag::Continue, InstData::None, span) } + + /// Emits a `MethodCall` with receiver, name, and arg list packed into `extra`. + pub fn method_call( + &mut self, + receiver: InstRef, + name: StringId, + args: &[InstRef], + span: Span, + ) -> InstRef { + let offset = self.extra_offset(); + self.uir.extra.push(receiver.raw()); + self.uir.extra.push(name.raw()); + self.uir.extra.push(Self::len_u32(args.len())); + for arg in args { + self.uir.extra.push(arg.raw()); + } + let len = Self::len_u32(method_call_extra::ARGS + args.len()); + self.push( + InstTag::MethodCall, + InstData::Extra(ExtraRange { offset, len }), + span, + ) + } } // ---------- Read-side helpers ---------- @@ -752,6 +793,13 @@ pub struct ForRangeView { pub body: Vec, } +/// Decoded view of an [`InstTag::MethodCall`] payload. +pub struct MethodCallView { + pub receiver: InstRef, + pub name: StringId, + pub args: Vec, +} + pub struct ElifView { pub cond: InstRef, pub body: Vec, @@ -876,6 +924,29 @@ impl Uir { } } + pub fn method_call_view(&self, r: InstRef) -> MethodCallView { + let inst = self.inst(r); + debug_assert!(matches!(inst.tag, InstTag::MethodCall)); + let range = match inst.data { + InstData::Extra(rng) => rng, + _ => unreachable!("MethodCall must carry InstData::Extra"), + }; + let slice = &self.extra[range.as_range()]; + let receiver = InstRef::from_raw(slice[method_call_extra::RECEIVER]); + let name = StringId::from_raw(slice[method_call_extra::NAME]); + let argc = slice[method_call_extra::ARGC] as usize; + let args = slice[method_call_extra::ARGS..method_call_extra::ARGS + argc] + .iter() + .copied() + .map(InstRef::from_raw) + .collect(); + MethodCallView { + receiver, + name, + args, + } + } + pub fn if_stmt_view(&self, r: InstRef) -> IfStmtView { let inst = self.inst(r); debug_assert!(matches!(inst.tag, InstTag::IfStmt)); @@ -1104,6 +1175,22 @@ fn write_inst( body_refs.join(", ") ) } + (InstTag::MethodCall, InstData::Extra(_)) => { + let view = uir.method_call_view(r); + write!( + f, + "method_call %{}.{}(", + view.receiver.index(), + pool.str(view.name) + )?; + for (i, a) in view.args.iter().enumerate() { + if i > 0 { + write!(f, ", ")?; + } + write!(f, "%{}", a.index())?; + } + writeln!(f, ")") + } (InstTag::Break, InstData::None) => writeln!(f, "break"), (InstTag::Continue, InstData::None) => writeln!(f, "continue"), (tag, data) => writeln!(f, "", tag, data), diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index d5da934..ef89c03 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -2058,3 +2058,59 @@ fn test_bool_to_str_builtin() { stdout ); } + +// ---- str.len() and str.is_empty() method calls ---- + +#[test] +fn test_str_len() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "s: str = \"hello\"\nassert(s.len() == 5, \"len should be 5\")"; + let test_file = create_test_file(temp_dir.path(), "str_len.ryo", code); + let output = run_ryo_command(&["run", "str_len.ryo"], &test_file).expect("Failed to run"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +#[test] +fn test_str_is_empty() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "s: str = \"\"\nassert(s.is_empty(), \"empty string should be empty\")"; + let test_file = create_test_file(temp_dir.path(), "str_empty.ryo", code); + let output = run_ryo_command(&["run", "str_empty.ryo"], &test_file).expect("Failed to run"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +#[test] +fn test_str_is_empty_false() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = + "s: str = \"hi\"\nassert(not s.is_empty(), \"non-empty string should not be empty\")"; + let test_file = create_test_file(temp_dir.path(), "str_not_empty.ryo", code); + let output = run_ryo_command(&["run", "str_not_empty.ryo"], &test_file).expect("Failed to run"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +#[test] +fn test_str_len_concat() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "s: str = \"ab\" + \"cde\"\nassert(s.len() == 5, \"concat len should be 5\")"; + let test_file = create_test_file(temp_dir.path(), "str_len_concat.ryo", code); + let output = + run_ryo_command(&["run", "str_len_concat.ryo"], &test_file).expect("Failed to run"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} From 00a3b36e5f39c4cc1bdb6d0ad475e1955dac8045 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 09:11:38 +0200 Subject: [PATCH 16/33] test: comprehensive str integration tests + edge cases Covers empty-string concat, equality, str + int_to_str chaining, and .len()/.is_empty() on empty strings. Function param/return tests are #[ignore] pending sret plumbing (Task 15). --- tests/integration_tests.rs | 98 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index ef89c03..8ca3b4d 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -2114,3 +2114,101 @@ fn test_str_len_concat() { String::from_utf8_lossy(&output.stderr) ); } + +#[test] +fn test_str_empty_concat_left() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = + "s: str = \"\" + \"hello\"\nassert(s.len() == 5, \"empty + hello should have len 5\")"; + let test_file = create_test_file(temp_dir.path(), "empty_left.ryo", code); + let output = run_ryo_command(&["run", "empty_left.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +#[test] +fn test_str_empty_concat_both() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "s: str = \"\" + \"\"\nassert(s.is_empty(), \"empty + empty should be empty\")"; + let test_file = create_test_file(temp_dir.path(), "empty_both.ryo", code); + let output = run_ryo_command(&["run", "empty_both.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +#[test] +fn test_str_empty_equality() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = + "a: str = \"\"\nb: str = \"\"\nassert(a == b, \"two empty strings should be equal\")"; + let test_file = create_test_file(temp_dir.path(), "empty_eq.ryo", code); + let output = run_ryo_command(&["run", "empty_eq.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +#[test] +fn test_str_concat_with_to_str() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "n: int = 42\ns: str = \"value = \" + int_to_str(n)\nprint(s)"; + let test_file = create_test_file(temp_dir.path(), "concat_int.ryo", code); + let output = run_ryo_command(&["run", "concat_int.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +#[test] +fn test_str_empty_len_zero() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "s: str = \"\"\nassert(s.len() == 0, \"empty string len should be 0\")"; + let test_file = create_test_file(temp_dir.path(), "empty_len.ryo", code); + let output = run_ryo_command(&["run", "empty_len.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +// Requires Task 15: sret plumbing +#[test] +#[ignore] +fn test_str_passed_to_function() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "fn greet(name: str):\n\tprint(name)\n\ngreet(\"Alice\")"; + let test_file = create_test_file(temp_dir.path(), "str_param.ryo", code); + let output = run_ryo_command(&["run", "str_param.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} + +// Requires Task 15: sret plumbing +#[test] +#[ignore] +fn test_str_returned_from_function() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = + "fn make_greeting() -> str:\n\treturn \"Hello!\"\n\ns: str = make_greeting()\nprint(s)"; + let test_file = create_test_file(temp_dir.path(), "str_return.ryo", code); + let output = run_ryo_command(&["run", "str_return.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); +} From c78fa06de82b06d8a1bcebf31c65cb26ec2eee19 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 09:26:24 +0200 Subject: [PATCH 17/33] feat(codegen): str function parameters and returns via sret MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Str params expand to 3 block params (ptr/len/cap). Str returns use the standard C ABI sret convention — a hidden first parameter points to a caller-allocated 24-byte buffer, the callee writes through it, the caller reads back the triple. Matches platform ABI on SysV x86_64, Windows x64, and AArch64; unifies user-fn ABI with the runtime call convention used for ryo_str_concat. --- src/codegen.rs | 181 +++++++++++++++++++++++++++++-------- tests/integration_tests.rs | 4 - 2 files changed, 144 insertions(+), 41 deletions(-) diff --git a/src/codegen.rs b/src/codegen.rs index 3310be8..5599645 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -108,6 +108,7 @@ enum ValueRepr { } impl ValueRepr { + #[allow(dead_code)] fn expect_scalar(self) -> Value { match self { ValueRepr::Scalar(v) => v, @@ -154,6 +155,9 @@ struct FunctionContext<'a, M: Module> { inst_values: HashMap, loop_stack: Vec, str_locals: HashMap, + /// For str-returning functions: the hidden sret pointer (first block param) + /// through which the callee writes the (ptr, len, cap) triple. + sret_ptr: Option, } impl Codegen { @@ -414,18 +418,41 @@ impl Codegen { let int_type = self.int_type; let mut locals: HashMap = HashMap::new(); + let is_main = pool.str(tir.name) == "main"; + let returns_str = !is_main && is_str_type(tir.return_type, pool); + let mut block_idx: usize = if returns_str { 1 } else { 0 }; + let sret_ptr = if returns_str { + Some(builder.block_params(entry_block)[0]) + } else { + None + }; + + let mut str_param_locals: HashMap = HashMap::new(); + for param in tir.params.iter() { - assert!( - !is_str_type(param.ty, pool), - "str parameters not yet supported (landing in Task 15)" - ); - } - for (i, param) in tir.params.iter().enumerate() { - let cl_ty = cranelift_type_for(param.ty, pool, int_type); - let var = builder.declare_var(cl_ty); - let param_val = builder.block_params(entry_block)[i]; - builder.def_var(var, param_val); - locals.insert(param.name, var); + if is_str_type(param.ty, pool) { + let var_ptr = builder.declare_var(int_type); + let var_len = builder.declare_var(types::I64); + let var_cap = builder.declare_var(types::I64); + builder.def_var(var_ptr, builder.block_params(entry_block)[block_idx]); + builder.def_var(var_len, builder.block_params(entry_block)[block_idx + 1]); + builder.def_var(var_cap, builder.block_params(entry_block)[block_idx + 2]); + str_param_locals.insert( + param.name, + StrLocals { + ptr: var_ptr, + len: var_len, + cap: var_cap, + }, + ); + block_idx += 3; + } else { + let cl_ty = cranelift_type_for(param.ty, pool, int_type); + let var = builder.declare_var(cl_ty); + builder.def_var(var, builder.block_params(entry_block)[block_idx]); + locals.insert(param.name, var); + block_idx += 1; + } } let mut ctx: FunctionContext<'_, M> = FunctionContext { @@ -440,23 +467,21 @@ impl Codegen { func_ids, inst_values: HashMap::new(), loop_stack: Vec::new(), - str_locals: HashMap::new(), + str_locals: str_param_locals, + sret_ptr, }; let has_return = Self::emit_body(&mut builder, &mut ctx, &tir.body_stmts())?; if !has_return { - let is_main = pool.str(tir.name) == "main"; - if is_main || tir.return_type != pool.void() { - // `main` always returns int 0 to the OS even - // when Ryo declares it void; non-main - // non-void functions also fall through to a - // zero return today (sema accepts missing - // returns; control-flow analysis lands in M8b). + if is_main { let zero = builder.ins().iconst(int_type, 0); builder.ins().return_(&[zero]); - } else { + } else if returns_str || tir.return_type == pool.void() { builder.ins().return_(&[]); + } else { + let zero = builder.ins().iconst(int_type, 0); + builder.ins().return_(&[zero]); } } @@ -548,8 +573,21 @@ impl Codegen { TirData::UnOp(o) => o, _ => unreachable!("Return must carry TirData::UnOp"), }; - let val = Self::eval_inst(builder, ctx, operand)?; - builder.ins().return_(&[val]); + if is_str_type(ctx.tir.return_type, ctx.pool) { + let sret = ctx.sret_ptr.expect("str-returning fn must have sret_ptr"); + let repr = Self::eval_inst_str(builder, ctx, operand)?; + let (ptr, len, cap) = match repr { + ValueRepr::Str { ptr, len, cap } => (ptr, len, cap), + _ => unreachable!("str return must produce ValueRepr::Str"), + }; + builder.ins().store(MemFlags::trusted(), ptr, sret, 0); + builder.ins().store(MemFlags::trusted(), len, sret, 8); + builder.ins().store(MemFlags::trusted(), cap, sret, 16); + builder.ins().return_(&[]); + } else { + let val = Self::eval_inst(builder, ctx, operand)?; + builder.ins().return_(&[val]); + } Ok(true) } TirTag::ReturnVoid => { @@ -860,7 +898,13 @@ impl Codegen { r: TirRef, ) -> Result { if let Some(repr) = ctx.inst_values.get(&r) { - return Ok(repr.expect_scalar()); + return Ok(match repr { + ValueRepr::Scalar(v) => *v, + // str-returning calls cache ValueRepr::Str; return the ptr + // component as the scalar stand-in (callers that need the + // full triple use eval_inst_str). + ValueRepr::Str { ptr, .. } => *ptr, + }); } let inst = ctx.tir.inst(r); let value = match inst.tag { @@ -1108,7 +1152,8 @@ impl Codegen { )); } }; - ctx.inst_values.insert(r, ValueRepr::Scalar(value)); + // Don't overwrite if emit_call already cached a Str repr (sret convention). + ctx.inst_values.entry(r).or_insert(ValueRepr::Scalar(value)); Ok(value) } @@ -1215,9 +1260,14 @@ impl Codegen { ValueRepr::Str { ptr, len, cap } } else { - // Non-formatter call — delegate to scalar - let val = Self::eval_inst(builder, ctx, r)?; - return Ok(ValueRepr::Scalar(val)); + // User call — eval_inst triggers emit_call which handles + // sret for str-returning calls and caches ValueRepr::Str. + Self::eval_inst(builder, ctx, r)?; + if let Some(repr) = ctx.inst_values.get(&r) { + return Ok(*repr); + } + // Fallback for non-str returning calls + return Ok(ValueRepr::Scalar(builder.ins().iconst(ctx.int_type, 0))); } } TirTag::StrConcat => { @@ -1335,8 +1385,28 @@ impl Codegen { let name_id = view.name; let name_str = ctx.pool.str(name_id); - // print is the only builtin with custom codegen (inline syscall). - // __ryo_panic and user functions go through the normal call path. + // print and __ryo_panic have custom codegen (inline syscall / raw scalar ABI). + // They do NOT use the str-triple expansion that user functions use. + if name_str == "__ryo_panic" { + // __ryo_panic(ptr, len) takes raw scalars — the StrConst .rodata + // pointer and an int len — NOT the str-triple ABI. + let mut arg_values = Vec::with_capacity(view.args.len()); + for arg in &view.args { + arg_values.push(Self::eval_inst(builder, ctx, *arg)?); + } + let callee_id = *ctx + .func_ids + .get(&name_id) + .ok_or_else(|| format!("Undefined function: '{}'", name_str))?; + let callee_ref = ctx.module.declare_func_in_func(callee_id, builder.func); + builder.ins().call(callee_ref, &arg_values); + builder.ins().trap(TrapCode::user(1).unwrap()); + let dead = builder.create_block(); + builder.seal_block(dead); + builder.switch_to_block(dead); + return Ok(builder.ins().iconst(types::I8, 0)); + } + if name_str == "print" { Self::generate_print_call(builder, ctx, &view.args)?; return Ok(builder.ins().iconst(ctx.int_type, 0)); @@ -1380,22 +1450,32 @@ impl Codegen { .get(&name_id) .ok_or_else(|| format!("Undefined function: '{}'", name_str))?; - let mut arg_values = Vec::with_capacity(view.args.len()); + let mut arg_values = Vec::with_capacity(view.args.len() * 3 + 1); for arg in &view.args { - arg_values.push(Self::eval_inst(builder, ctx, *arg)?); + let arg_ty = ctx.tir.inst(*arg).ty; + if is_str_type(arg_ty, ctx.pool) { + let repr = Self::eval_inst_str(builder, ctx, *arg)?; + match repr { + ValueRepr::Str { ptr, len, cap } => { + arg_values.push(ptr); + arg_values.push(len); + arg_values.push(cap); + } + _ => unreachable!("str-typed arg must produce ValueRepr::Str"), + } + } else { + arg_values.push(Self::eval_inst(builder, ctx, *arg)?); + } } let callee_ref = ctx.module.declare_func_in_func(callee_id, builder.func); - let call = builder.ins().call(callee_ref, &arg_values); - let results = builder.inst_results(call); + + let ret_ty = ctx.tir.inst(r).ty; // If the callee returns never (e.g. __ryo_panic), the call is // a terminator. Emit a trap + dead block for subsequent IR. - // The dead block needs no explicit terminator — compile_function's - // fallthrough `return 0` provides one. Cranelift verifier is - // happy as long as every block has exactly one terminator. - let ret_ty = ctx.tir.inst(r).ty; if ctx.pool.is_never(ret_ty) { + builder.ins().call(callee_ref, &arg_values); builder.ins().trap(TrapCode::user(1).unwrap()); let dead = builder.create_block(); builder.seal_block(dead); @@ -1404,6 +1484,33 @@ impl Codegen { return Ok(builder.ins().iconst(dummy_ty, 0)); } + if is_str_type(ret_ty, ctx.pool) { + // sret: allocate 24-byte slot, prepend pointer to args + let slot = builder.create_sized_stack_slot(StackSlotData::new( + StackSlotKind::ExplicitSlot, + 24, + 3, + )); + let out = builder.ins().stack_addr(ctx.int_type, slot, 0); + + let mut all_args = Vec::with_capacity(arg_values.len() + 1); + all_args.push(out); + all_args.extend(arg_values); + + builder.ins().call(callee_ref, &all_args); + + let ptr = builder + .ins() + .load(ctx.int_type, MemFlags::trusted(), out, 0); + let len = builder.ins().load(types::I64, MemFlags::trusted(), out, 8); + let cap = builder.ins().load(types::I64, MemFlags::trusted(), out, 16); + ctx.inst_values.insert(r, ValueRepr::Str { ptr, len, cap }); + return Ok(ptr); // dummy scalar — consumers use eval_inst_str + } + + let call = builder.ins().call(callee_ref, &arg_values); + let results = builder.inst_results(call); + if results.is_empty() { Ok(builder.ins().iconst(ctx.int_type, 0)) } else { diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index 8ca3b4d..02289bf 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -2182,9 +2182,7 @@ fn test_str_empty_len_zero() { ); } -// Requires Task 15: sret plumbing #[test] -#[ignore] fn test_str_passed_to_function() { let temp_dir = TempDir::new().expect("Failed to create temp directory"); let code = "fn greet(name: str):\n\tprint(name)\n\ngreet(\"Alice\")"; @@ -2197,9 +2195,7 @@ fn test_str_passed_to_function() { ); } -// Requires Task 15: sret plumbing #[test] -#[ignore] fn test_str_returned_from_function() { let temp_dir = TempDir::new().expect("Failed to create temp directory"); let code = From ace8934857931602fc05493d190683dbb45e9e6e Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 09:46:25 +0200 Subject: [PATCH 18/33] fix: runtime correctness bugs and dead code cleanup Fix ryo_int_to_str producing "-" for i64::MIN by using u64 magnitude. Fix ryo_float_to_str mishandling NaN/Infinity with proper guards. Replace silent iconst(0) fallback in eval_inst_str with unreachable!(). Remove dead emit_str_literal, LocalVar enum, and redundant OOM checks. Fix cleanup_runtime_temp to accept &Path instead of &PathBuf. --- runtime/src/lib.rs | 121 +++++++++++++++++++++++++++++++++++---------- src/codegen.rs | 39 ++------------- src/runtime_lib.rs | 2 +- 3 files changed, 98 insertions(+), 64 deletions(-) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index 20cb63e..cbe83b5 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -52,6 +52,20 @@ pub unsafe extern "C" fn ryo_str_realloc(ptr: *mut u8, old_cap: u64, new_cap: u6 new_ptr } +/// Helper for fixed-string results (nan, inf, etc.) +/// +/// # Safety +/// `out` must point to a valid `RyoStrFat`. +unsafe fn write_str_result(s: &[u8], out: *mut RyoStrFat) { + let ptr = ryo_str_alloc(s.len() as u64); + unsafe { + core::ptr::copy_nonoverlapping(s.as_ptr(), ptr, s.len()); + (*out).ptr = ptr; + (*out).len = s.len() as u64; + (*out).cap = s.len() as u64; + } +} + fn layout_for(cap: u64) -> Layout { Layout::from_size_align(cap as usize, 1).unwrap_or_else(|_| oom_abort()) } @@ -73,9 +87,6 @@ pub unsafe extern "C" fn ryo_str_from_literal(data: *const u8, len: u64, out: *m return; } let ptr = ryo_str_alloc(len); - if ptr.is_null() { - oom_abort(); - } core::ptr::copy_nonoverlapping(data, ptr, len as usize); (*out).ptr = ptr; (*out).len = len; @@ -103,9 +114,6 @@ pub unsafe extern "C" fn ryo_str_concat( return; } let ptr = ryo_str_alloc(total); - if ptr.is_null() { - oom_abort(); - } if l_len > 0 { core::ptr::copy_nonoverlapping(l_ptr, ptr, l_len as usize); } @@ -144,11 +152,14 @@ pub unsafe extern "C" fn ryo_str_eq( #[unsafe(no_mangle)] pub unsafe extern "C" fn ryo_int_to_str(value: i64, out: *mut RyoStrFat) { let mut buf = [0u8; 20]; - let mut n = value; - let negative = n < 0; - if negative { - n = n.wrapping_neg(); - } + let negative = value < 0; + // Work with unsigned magnitude to handle i64::MIN correctly + // (i64::MIN.wrapping_neg() overflows back to i64::MIN). + let mut n: u64 = if negative { + (value as u64).wrapping_neg() + } else { + value as u64 + }; let mut pos = buf.len(); if n == 0 { pos -= 1; @@ -166,9 +177,6 @@ pub unsafe extern "C" fn ryo_int_to_str(value: i64, out: *mut RyoStrFat) { } let len = (buf.len() - pos) as u64; let ptr = ryo_str_alloc(len); - if ptr.is_null() { - oom_abort(); - } unsafe { core::ptr::copy_nonoverlapping(buf.as_ptr().add(pos), ptr, len as usize); (*out).ptr = ptr; @@ -181,6 +189,17 @@ pub unsafe extern "C" fn ryo_int_to_str(value: i64, out: *mut RyoStrFat) { /// `out` must point to a valid `RyoStrFat`. #[unsafe(no_mangle)] pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { + if value.is_nan() { + return unsafe { write_str_result(b"nan", out) }; + } + if value.is_infinite() { + if value < 0.0 { + return unsafe { write_str_result(b"-inf", out) }; + } else { + return unsafe { write_str_result(b"inf", out) }; + } + } + let mut buf = [0u8; 64]; let mut pos = 0usize; @@ -231,9 +250,6 @@ pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { let len = pos as u64; let ptr = ryo_str_alloc(len); - if ptr.is_null() { - oom_abort(); - } unsafe { core::ptr::copy_nonoverlapping(buf.as_ptr(), ptr, len as usize); (*out).ptr = ptr; @@ -247,16 +263,7 @@ pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { #[unsafe(no_mangle)] pub unsafe extern "C" fn ryo_bool_to_str(value: u8, out: *mut RyoStrFat) { let s: &[u8] = if value != 0 { b"true" } else { b"false" }; - let ptr = ryo_str_alloc(s.len() as u64); - if ptr.is_null() { - oom_abort(); - } - unsafe { - core::ptr::copy_nonoverlapping(s.as_ptr(), ptr, s.len()); - (*out).ptr = ptr; - (*out).len = s.len() as u64; - (*out).cap = s.len() as u64; - } + unsafe { write_str_result(s, out) }; } #[cfg(test)] @@ -463,6 +470,66 @@ mod tests { } } + #[test] + fn test_int_to_str_min() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_int_to_str(i64::MIN, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"-9223372036854775808"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_float_to_str_nan() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_float_to_str(f64::NAN, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"nan"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_float_to_str_inf() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_float_to_str(f64::INFINITY, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"inf"); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_float_to_str_neg_inf() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_float_to_str(f64::NEG_INFINITY, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"-inf"); + ryo_str_free(out.ptr, out.cap); + } + } + #[test] fn test_float_to_str() { unsafe { diff --git a/src/codegen.rs b/src/codegen.rs index 5599645..5100f6b 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -99,16 +99,11 @@ struct LoopContext { #[derive(Debug, Clone, Copy)] enum ValueRepr { Scalar(Value), - #[allow(dead_code)] - Str { - ptr: Value, - len: Value, - cap: Value, - }, + Str { ptr: Value, len: Value, cap: Value }, } impl ValueRepr { - #[allow(dead_code)] + #[cfg(test)] fn expect_scalar(self) -> Value { match self { ValueRepr::Scalar(v) => v, @@ -117,17 +112,6 @@ impl ValueRepr { } } -#[derive(Debug, Clone, Copy)] -#[allow(dead_code)] -enum LocalVar { - Scalar(Variable), - Str { - ptr: Variable, - len: Variable, - cap: Variable, - }, -} - struct StrLocals { ptr: Variable, len: Variable, @@ -1266,8 +1250,7 @@ impl Codegen { if let Some(repr) = ctx.inst_values.get(&r) { return Ok(*repr); } - // Fallback for non-str returning calls - return Ok(ValueRepr::Scalar(builder.ins().iconst(ctx.int_type, 0))); + unreachable!("str-returning user call must cache ValueRepr::Str via emit_call"); } } TirTag::StrConcat => { @@ -1626,22 +1609,6 @@ fn declare_write( Ok(module.declare_func_in_func(func_id, builder.func)) } -/// Materialize a string literal pointer into the function. Pulled -/// out of the `Codegen` impl so it can be called without juggling -/// `&mut self` borrows alongside the `FunctionContext`'s mutable -/// references to the same fields. -#[allow(dead_code)] -fn emit_str_literal( - builder: &mut FunctionBuilder, - ctx: &mut FunctionContext<'_, M>, - id: StringId, -) -> Result { - let content = ctx.pool.str(id); - let data_id = store_string(id, content, ctx.module, ctx.data_ctx, ctx.string_data)?; - let data_ref = ctx.module.declare_data_in_func(data_id, builder.func); - Ok(builder.ins().global_value(ctx.int_type, data_ref)) -} - fn store_string( content_id: StringId, content: &str, diff --git a/src/runtime_lib.rs b/src/runtime_lib.rs index 65f27fb..f683791 100644 --- a/src/runtime_lib.rs +++ b/src/runtime_lib.rs @@ -12,7 +12,7 @@ pub fn extract_runtime_to_temp() -> Result { Ok(path) } -pub fn cleanup_runtime_temp(path: &PathBuf) { +pub fn cleanup_runtime_temp(path: &std::path::Path) { let _ = fs::remove_file(path); if let Some(parent) = path.parent() { let _ = fs::remove_dir(parent); From d67ddac105b92fbeb1a6e2f69854c53f50511357 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 09:47:21 +0200 Subject: [PATCH 19/33] docs: update roadmap ownership lattice and CLAUDE.md emphasis --- CLAUDE.md | 2 +- docs/dev/implementation_roadmap.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 875a836..9354c8f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -65,7 +65,7 @@ GitHub Actions runs on pushes to `main` and PRs targeting `main`: `cargo fmt --c **Commit prefixes:** `feat:`, `fix:`, `docs:`, `spec:`, `dev:`, `roadmap:`, `test:`, `chore:`, `refactor:`. Keep subjects under 72 chars. Add body for non-obvious changes. -Never author Claude on commits nor PRs. +IMPORTANT: Never author Claude on commits nor PRs. --- diff --git a/docs/dev/implementation_roadmap.md b/docs/dev/implementation_roadmap.md index 028e8bc..f609c57 100644 --- a/docs/dev/implementation_roadmap.md +++ b/docs/dev/implementation_roadmap.md @@ -903,8 +903,8 @@ Every downstream milestone in Phase 2 (structs, tuples, enums, pattern matching, - **F-strings (`f"Value: {x}"`) are deferred to v0.2** — see Phase 5: F-strings & String Interpolation. v0.1 uses `+` concatenation with the `*_to_str` helpers above. - Parser/AST: accept the **`move` keyword** as a prefix on parameter declarations (`fn consume(move s: str)`). Without `move`, parameters borrow. Sema records the convention on the function signature (type-only; ownership lives elsewhere). - Add a **new pipeline stage `src/ownership.rs`** between Sema and Codegen — modeled on Mojo's MLIR-based lifetime/ASAP-destruction passes (Zig stops being a useful compiler reference for the borrow checker; see [mojo_reference.md](mojo_reference.md)). The pass mutates each `Tir` in place: inserts `TirTag::Free`, tracks per-`TirRef` ownership state, and reports diagnostics. - - Per-`TirRef` (SSA value) state lattice: `NotTracked` / `Valid` / `Moved { moved_at, moved_via }` - - `current_owner: HashMap` shadow table for named bindings + - Per-`TirRef` (SSA value) state lattice: `NotTracked` / `Valid` / `Borrowed` / `Moved { moved_at, kind }` + - `current_owner: HashMap` shadow table for named bindings (resolves binding-read sites and feeds diagnostics) - Implicit immutable borrow for function parameters (Rule 2); `move` opts into ownership transfer (Rule 4) - Standard forward dataflow with CFG-join merges; loop fixed-point (typically converges in 2 iterations) - Reassignment of `mut` move-typed bindings frees the prior buffer @@ -952,7 +952,7 @@ fn main(): - Move tracking covers **named bindings and anonymous owned temporaries** in this milestone. Explicit `&T` / `inout T` borrow syntax arrives in M8.2 / M8.3; field-by-field move tracking (partial moves out of structs/tuples) follows naturally because the same dataflow analysis is reused. - `str` deallocation follows hybrid eager destruction (spec Section 5.4) — `Free` is inserted after the binding's last use, after the old buffer when a `mut` binding is reassigned over a `Valid` slot, and at the end of the enclosing statement for anonymous owned temporaries. Lexical scope-exit RAII would be too late and would leak intermediate buffers in concat chains. User-extensible cleanup via the `drop` method lands in M23. - Allocator failure surfaces as a panic in v0.1; allocation-fallible APIs ship alongside error unions (M13). -- Detailed design: see [2026-05-11-milestone-8.1-heap-str-and-move-semantics-design.md](../superpowers/specs/2026-05-11-milestone-8.1-heap-str-and-move-semantics-design.md). +- Detailed design: see [2026-05-20-milestone-8.1-heap-str-and-move-semantics-design.md](../superpowers/specs/2026-05-20-milestone-8.1-heap-str-and-move-semantics-design.md). - Dependencies: Milestone 8 (control flow blocks shape the dataflow regions the move tracker walks). ### Milestone 8.2: Immutable Borrows (`&T`) [alpha] From 6fb04f4b579b108b18091528c2d3968771784cd2 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 10:41:16 +0200 Subject: [PATCH 20/33] =?UTF-8?q?perf(runtime):=20static=20string=20optimi?= =?UTF-8?q?zation=20=E2=80=94=20literals=20avoid=20heap=20allocation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ryo_str_from_literal now returns the input pointer directly with cap=0 as a sentinel meaning 'static, don't free'. This eliminates a heap allocation + memcpy for every string literal evaluation. --- runtime/src/lib.rs | 40 ++++++++++++++++++++++++++++++++++------ 1 file changed, 34 insertions(+), 6 deletions(-) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index cbe83b5..f168119 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -86,11 +86,10 @@ pub unsafe extern "C" fn ryo_str_from_literal(data: *const u8, len: u64, out: *m (*out).cap = 0; return; } - let ptr = ryo_str_alloc(len); - core::ptr::copy_nonoverlapping(data, ptr, len as usize); - (*out).ptr = ptr; + // Return pointer directly to rodata with cap=0 as static sentinel + (*out).ptr = data as *mut u8; (*out).len = len; - (*out).cap = len; + (*out).cap = 0; } } @@ -330,11 +329,40 @@ mod tests { cap: 0, }; ryo_str_from_literal(data.as_ptr(), 5, &mut out); - assert!(!out.ptr.is_null()); + assert_eq!(out.ptr as *const u8, data.as_ptr()); assert_eq!(out.len, 5); - assert_eq!(out.cap, 5); + assert_eq!(out.cap, 0); let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); assert_eq!(slice, b"hello"); + } + } + + #[test] + fn test_from_literal_returns_static_pointer() { + unsafe { + let data = b"hello"; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_from_literal(data.as_ptr(), 5, &mut out); + assert_eq!(out.ptr as *const u8, data.as_ptr()); + assert_eq!(out.len, 5); + assert_eq!(out.cap, 0); + } + } + + #[test] + fn test_free_static_str_is_noop() { + unsafe { + let data = b"hello"; + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_from_literal(data.as_ptr(), 5, &mut out); ryo_str_free(out.ptr, out.cap); } } From 46da72a43e9c8167ff744cdf131be6d5df10c852 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 10:54:17 +0200 Subject: [PATCH 21/33] test(runtime): verify static/heap string interaction safety Confirms that concat with static (cap=0) inputs produces heap results, and that ryo_str_free is a safe noop on static strings. --- runtime/src/lib.rs | 47 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index f168119..d19897b 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -603,4 +603,51 @@ mod tests { ryo_str_free(out.ptr, out.cap); } } + + #[test] + fn test_concat_static_left_heap_right() { + unsafe { + // Simulate: "Hello, " + heap_string + let left = b"Hello, "; + let left_fat = RyoStrFat { + ptr: left.as_ptr() as *mut u8, + len: 7, + cap: 0, // static + }; + + // Create a heap string for the right side + let mut right_fat = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + let right_data = b"World!"; + let right_ptr = ryo_str_alloc(6); + core::ptr::copy_nonoverlapping(right_data.as_ptr(), right_ptr, 6); + right_fat.ptr = right_ptr; + right_fat.len = 6; + right_fat.cap = 6; + + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_str_concat( + left_fat.ptr, left_fat.len, + right_fat.ptr, right_fat.len, + &mut out, + ); + + assert_eq!(out.len, 13); + assert!(out.cap > 0); // heap-allocated result + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + assert_eq!(slice, b"Hello, World!"); + + // Free: static left is safe (cap=0 → noop), heap right and result freed + ryo_str_free(left_fat.ptr, left_fat.cap); + ryo_str_free(right_fat.ptr, right_fat.cap); + ryo_str_free(out.ptr, out.cap); + } + } } From 103b96bea1af9b798565aaadf9dc92976cbc73ab Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 10:57:12 +0200 Subject: [PATCH 22/33] fix(runtime): replace naive float_to_str with ryu for correctness The hand-rolled algorithm used `as u64` which saturates for values >= 2^64 and was limited to 6 fractional digits. ryu provides correct shortest-representation formatting with round-trip fidelity. --- Cargo.lock | 9 +++++ runtime/Cargo.toml | 3 ++ runtime/src/lib.rs | 90 +++++++++++++++++++++------------------------- 3 files changed, 52 insertions(+), 50 deletions(-) diff --git a/Cargo.lock b/Cargo.lock index afb46dc..dd00e7b 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -1006,6 +1006,15 @@ dependencies = [ [[package]] name = "ryo-runtime" version = "0.1.0" +dependencies = [ + "ryu", +] + +[[package]] +name = "ryu" +version = "1.0.23" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9774ba4a74de5f7b1c1451ed6cd5285a32eddb5cccb8cc655a4e50009e06477f" [[package]] name = "semver" diff --git a/runtime/Cargo.toml b/runtime/Cargo.toml index 7b03f25..b20de59 100644 --- a/runtime/Cargo.toml +++ b/runtime/Cargo.toml @@ -6,3 +6,6 @@ edition = "2024" [lib] crate-type = ["staticlib", "rlib"] # rlib is needed for `cargo test` to work (staticlib alone doesn't support test harness) + +[dependencies] +ryu = "1" diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index d19897b..5592b98 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -199,58 +199,13 @@ pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { } } - let mut buf = [0u8; 64]; - let mut pos = 0usize; - - let negative = value < 0.0; - let abs_val = if negative { -value } else { value }; - - if negative { - buf[pos] = b'-'; - pos += 1; - } - - let int_part = abs_val as u64; - let frac_part = ((abs_val - int_part as f64) * 1_000_000.0 + 0.5) as u64; - - // Write integer part - let mut int_buf = [0u8; 20]; - let mut int_pos = int_buf.len(); - if int_part == 0 { - int_pos -= 1; - int_buf[int_pos] = b'0'; - } else { - let mut n = int_part; - while n > 0 { - int_pos -= 1; - int_buf[int_pos] = b'0' + (n % 10) as u8; - n /= 10; - } - } - let int_len = int_buf.len() - int_pos; - buf[pos..pos + int_len].copy_from_slice(&int_buf[int_pos..]); - pos += int_len; - - // Write fractional part (trim trailing zeros) - buf[pos] = b'.'; - pos += 1; - let mut frac_buf = [0u8; 6]; - let mut f = frac_part; - for i in (0..6).rev() { - frac_buf[i] = b'0' + (f % 10) as u8; - f /= 10; - } - let mut frac_len = 6; - while frac_len > 1 && frac_buf[frac_len - 1] == b'0' { - frac_len -= 1; - } - buf[pos..pos + frac_len].copy_from_slice(&frac_buf[..frac_len]); - pos += frac_len; - - let len = pos as u64; + let mut buf = ryu::Buffer::new(); + let s = buf.format(value); + let bytes = s.as_bytes(); + let len = bytes.len() as u64; let ptr = ryo_str_alloc(len); unsafe { - core::ptr::copy_nonoverlapping(buf.as_ptr(), ptr, len as usize); + core::ptr::copy_nonoverlapping(bytes.as_ptr(), ptr, len as usize); (*out).ptr = ptr; (*out).len = len; (*out).cap = len; @@ -574,6 +529,41 @@ mod tests { } } + #[test] + fn test_float_to_str_large_value() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + // Value larger than u64::MAX — old code would saturate + ryo_float_to_str(1.8e19, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + let s = core::str::from_utf8(slice).unwrap(); + let parsed: f64 = s.parse().unwrap(); + assert_eq!(parsed, 1.8e19); + ryo_str_free(out.ptr, out.cap); + } + } + + #[test] + fn test_float_to_str_precision() { + unsafe { + let mut out = RyoStrFat { + ptr: std::ptr::null_mut(), + len: 0, + cap: 0, + }; + ryo_float_to_str(0.1 + 0.2, &mut out); + let slice = core::slice::from_raw_parts(out.ptr, out.len as usize); + let s = core::str::from_utf8(slice).unwrap(); + let parsed: f64 = s.parse().unwrap(); + assert_eq!(parsed, 0.1 + 0.2); + ryo_str_free(out.ptr, out.cap); + } + } + #[test] fn test_bool_to_str_true() { unsafe { From b065a590f5a817022f4e687ad59e4e1ede3f6f3e Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 11:00:08 +0200 Subject: [PATCH 23/33] perf(build): cache runtime archive with content hash Eliminates redundant filesystem I/O on every AOT build. The embedded libryo_runtime.a is extracted once to ~/.ryo/cache/.a and reused across builds until the runtime binary changes. --- Cargo.toml | 1 + src/runtime_lib.rs | 35 +++++++++++++++++++++++++++-------- 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/Cargo.toml b/Cargo.toml index f58f9c7..834ddc9 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -40,6 +40,7 @@ tar = "0.4" target-lexicon = "0.13.5" ureq = "3" dirs = "6" +sha2 = "0.10" [dev-dependencies] tempfile = "3.27" diff --git a/src/runtime_lib.rs b/src/runtime_lib.rs index f683791..413575e 100644 --- a/src/runtime_lib.rs +++ b/src/runtime_lib.rs @@ -1,20 +1,39 @@ +use sha2::{Digest, Sha256}; use std::fs; use std::io; use std::path::PathBuf; const RYO_RUNTIME_LIB: &[u8] = include_bytes!(env!("RYO_RUNTIME_LIB")); +fn cache_dir() -> PathBuf { + dirs::home_dir() + .expect("cannot determine home directory") + .join(".ryo") + .join("cache") +} + +fn content_hash() -> String { + let hash = Sha256::digest(RYO_RUNTIME_LIB); + format!("{:x}", hash)[..16].to_string() +} + pub fn extract_runtime_to_temp() -> Result { - let dir = std::env::temp_dir().join(format!("ryo-runtime-{}", std::process::id())); + let dir = cache_dir(); + let hash = content_hash(); + let path = dir.join(format!("libryo_runtime-{}.a", hash)); + + if path.exists() { + return Ok(path); + } + fs::create_dir_all(&dir)?; - let path = dir.join("libryo_runtime.a"); - fs::write(&path, RYO_RUNTIME_LIB)?; + // Write to a temp name and rename for atomicity + let tmp_path = dir.join(format!("libryo_runtime-{}.a.tmp.{}", hash, std::process::id())); + fs::write(&tmp_path, RYO_RUNTIME_LIB)?; + fs::rename(&tmp_path, &path)?; Ok(path) } -pub fn cleanup_runtime_temp(path: &std::path::Path) { - let _ = fs::remove_file(path); - if let Some(parent) = path.parent() { - let _ = fs::remove_dir(parent); - } +pub fn cleanup_runtime_temp(_path: &std::path::Path) { + // Cached — no cleanup needed. The file persists for future builds. } From 82ffee54099901d828f415502467546dc17a99db Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 11:04:38 +0200 Subject: [PATCH 24/33] test: integration tests for float_to_str edge cases Verifies round-trip fidelity for large floats (>2^64) and small decimals through the full JIT pipeline. --- tests/integration_tests.rs | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index 02289bf..bee165e 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -2036,6 +2036,37 @@ fn test_float_to_str_builtin() { ); } +#[test] +fn test_float_to_str_large_number() { + let dir = tempfile::tempdir().unwrap(); + // 18000000000000000000.0 is a large number (1.8e19) + let src = create_test_file(dir.path(), "large_float.ryo", "fn main():\n\tprint(float_to_str(18000000000000000000.0))\n"); + let output = run_ryo_command(&["run", "large_float.ryo"], &src); + let output = output.unwrap(); + assert!(output.status.success(), "stderr: {}", String::from_utf8_lossy(&output.stderr)); + let stdout = String::from_utf8_lossy(&output.stdout); + // Extract the float value: it's after "[Codegen]" and before "[Result]" + let after_codegen = stdout.split("[Codegen]").nth(1).unwrap(); + let float_str = after_codegen.split("[Result]").next().unwrap().trim(); + let parsed: f64 = float_str.parse().unwrap(); + assert_eq!(parsed, 1.8e19); +} + +#[test] +fn test_float_to_str_small_decimal() { + let dir = tempfile::tempdir().unwrap(); + let src = create_test_file(dir.path(), "small_float.ryo", "fn main():\n\tprint(float_to_str(0.1))\n"); + let output = run_ryo_command(&["run", "small_float.ryo"], &src); + let output = output.unwrap(); + assert!(output.status.success(), "stderr: {}", String::from_utf8_lossy(&output.stderr)); + let stdout = String::from_utf8_lossy(&output.stdout); + // Extract the float value: it's after "[Codegen]" and before "[Result]" + let after_codegen = stdout.split("[Codegen]").nth(1).unwrap(); + let float_str = after_codegen.split("[Result]").next().unwrap().trim(); + let parsed: f64 = float_str.parse().unwrap(); + assert_eq!(parsed, 0.1); +} + #[test] fn test_bool_to_str_builtin() { let temp_dir = TempDir::new().expect("Failed to create temp directory"); From af4079a8aa8cab452ef7d0f6451a4d0ce949690f Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 11:06:47 +0200 Subject: [PATCH 25/33] style: cargo fmt --- Cargo.lock | 72 ++++++++++++++++++++++++++++++++++++++ runtime/src/lib.rs | 6 ++-- src/runtime_lib.rs | 6 +++- tests/integration_tests.rs | 24 ++++++++++--- 4 files changed, 101 insertions(+), 7 deletions(-) diff --git a/Cargo.lock b/Cargo.lock index dd00e7b..d6358b8 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -122,6 +122,15 @@ version = "2.11.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c4512299f36f043ab09a583e57bceb5a5aab7a73db1805848e8fef3c9e8c78b3" +[[package]] +name = "block-buffer" +version = "0.10.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71" +dependencies = [ + "generic-array", +] + [[package]] name = "bumpalo" version = "3.20.2" @@ -213,6 +222,15 @@ version = "1.0.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1d07550c9036bf2ae0c684c4297d503f838287c83c53686d05370d0e139ae570" +[[package]] +name = "cpufeatures" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280" +dependencies = [ + "libc", +] + [[package]] name = "cranelift" version = "0.131.1" @@ -415,6 +433,26 @@ dependencies = [ "cfg-if", ] +[[package]] +name = "crypto-common" +version = "0.1.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "78c8292055d1c1df0cce5d180393dc8cce0abec0a7102adb6c7b1eef6016d60a" +dependencies = [ + "generic-array", + "typenum", +] + +[[package]] +name = "digest" +version = "0.10.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292" +dependencies = [ + "block-buffer", + "crypto-common", +] + [[package]] name = "dirs" version = "6.0.0" @@ -503,6 +541,16 @@ version = "0.2.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "77ce24cb58228fbb8aa041425bb1050850ac19177686ea6e0f41a70416f56fdb" +[[package]] +name = "generic-array" +version = "0.14.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a" +dependencies = [ + "typenum", + "version_check", +] + [[package]] name = "getrandom" version = "0.2.17" @@ -996,6 +1044,7 @@ dependencies = [ "hashbrown 0.17.1", "logos", "ryo-runtime", + "sha2", "tar", "target-lexicon", "tempfile", @@ -1065,6 +1114,17 @@ dependencies = [ "zmij", ] +[[package]] +name = "sha2" +version = "0.10.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283" +dependencies = [ + "cfg-if", + "cpufeatures", + "digest", +] + [[package]] name = "shlex" version = "1.3.0" @@ -1175,6 +1235,12 @@ dependencies = [ "syn", ] +[[package]] +name = "typenum" +version = "1.20.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "40ce102ab67701b8526c123c1bab5cbe42d7040ccfd0f64af1a385808d2f43de" + [[package]] name = "unicode-ident" version = "1.0.24" @@ -1246,6 +1312,12 @@ version = "0.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821" +[[package]] +name = "version_check" +version = "0.9.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" + [[package]] name = "wasi" version = "0.11.1+wasi-snapshot-preview1" diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index 5592b98..c4a0699 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -624,8 +624,10 @@ mod tests { cap: 0, }; ryo_str_concat( - left_fat.ptr, left_fat.len, - right_fat.ptr, right_fat.len, + left_fat.ptr, + left_fat.len, + right_fat.ptr, + right_fat.len, &mut out, ); diff --git a/src/runtime_lib.rs b/src/runtime_lib.rs index 413575e..0cfd87c 100644 --- a/src/runtime_lib.rs +++ b/src/runtime_lib.rs @@ -28,7 +28,11 @@ pub fn extract_runtime_to_temp() -> Result { fs::create_dir_all(&dir)?; // Write to a temp name and rename for atomicity - let tmp_path = dir.join(format!("libryo_runtime-{}.a.tmp.{}", hash, std::process::id())); + let tmp_path = dir.join(format!( + "libryo_runtime-{}.a.tmp.{}", + hash, + std::process::id() + )); fs::write(&tmp_path, RYO_RUNTIME_LIB)?; fs::rename(&tmp_path, &path)?; Ok(path) diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index bee165e..2d05061 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -2040,10 +2040,18 @@ fn test_float_to_str_builtin() { fn test_float_to_str_large_number() { let dir = tempfile::tempdir().unwrap(); // 18000000000000000000.0 is a large number (1.8e19) - let src = create_test_file(dir.path(), "large_float.ryo", "fn main():\n\tprint(float_to_str(18000000000000000000.0))\n"); + let src = create_test_file( + dir.path(), + "large_float.ryo", + "fn main():\n\tprint(float_to_str(18000000000000000000.0))\n", + ); let output = run_ryo_command(&["run", "large_float.ryo"], &src); let output = output.unwrap(); - assert!(output.status.success(), "stderr: {}", String::from_utf8_lossy(&output.stderr)); + assert!( + output.status.success(), + "stderr: {}", + String::from_utf8_lossy(&output.stderr) + ); let stdout = String::from_utf8_lossy(&output.stdout); // Extract the float value: it's after "[Codegen]" and before "[Result]" let after_codegen = stdout.split("[Codegen]").nth(1).unwrap(); @@ -2055,10 +2063,18 @@ fn test_float_to_str_large_number() { #[test] fn test_float_to_str_small_decimal() { let dir = tempfile::tempdir().unwrap(); - let src = create_test_file(dir.path(), "small_float.ryo", "fn main():\n\tprint(float_to_str(0.1))\n"); + let src = create_test_file( + dir.path(), + "small_float.ryo", + "fn main():\n\tprint(float_to_str(0.1))\n", + ); let output = run_ryo_command(&["run", "small_float.ryo"], &src); let output = output.unwrap(); - assert!(output.status.success(), "stderr: {}", String::from_utf8_lossy(&output.stderr)); + assert!( + output.status.success(), + "stderr: {}", + String::from_utf8_lossy(&output.stderr) + ); let stdout = String::from_utf8_lossy(&output.stdout); // Extract the float value: it's after "[Codegen]" and before "[Result]" let after_codegen = stdout.split("[Codegen]").nth(1).unwrap(); From dd3a3358f25724dea7321874d22db0ca8998bcf7 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 11:22:47 +0200 Subject: [PATCH 26/33] fix(ci): build runtime before clippy and test The build script requires libryo_runtime.a at target/release/. Previously this only worked due to rust-cache containing a stale artifact. New deps (sha2, ryu) invalidated the cache, exposing the missing build step. --- .github/workflows/ci.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 44437a5..a3c835d 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -36,6 +36,7 @@ jobs: - uses: Swatinem/rust-cache@v2 with: prefix-key: clippy + - run: cargo build -p ryo-runtime --release - run: cargo clippy --all-targets test: @@ -65,4 +66,5 @@ jobs: - name: Install Zig toolchain if: steps.zig-cache.outputs.cache-hit != 'true' run: cargo run -- toolchain install + - run: cargo build -p ryo-runtime --release - run: cargo test From 50814c9a3664e68e79ee746273a88cd4c8ba17db Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 11:35:48 +0200 Subject: [PATCH 27/33] fix: address code review findings across runtime, codegen, and sema - runtime: checked_add in ryo_str_concat to prevent overflow/wrap - codegen: save/restore str_locals in emit_scoped_body to prevent leaks - codegen: handle string reassignment in TirTag::Assign - runtime_lib: cache_dir returns Result instead of panicking - sema: suppress cascading TypeMismatch on error-typed builtin args - sema: eagerly analyze MethodCall args for diagnostics --- runtime/src/lib.rs | 5 ++++- src/codegen.rs | 16 ++++++++++++++++ src/runtime_lib.rs | 9 ++++----- src/sema.rs | 16 ++++++++++++++++ 4 files changed, 40 insertions(+), 6 deletions(-) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index c4a0699..d23d771 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -105,7 +105,10 @@ pub unsafe extern "C" fn ryo_str_concat( out: *mut RyoStrFat, ) { unsafe { - let total = l_len + r_len; + let total = match l_len.checked_add(r_len) { + Some(t) => t, + None => oom_abort(), + }; if total == 0 { (*out).ptr = std::ptr::null_mut(); (*out).len = 0; diff --git a/src/codegen.rs b/src/codegen.rs index 5100f6b..b3f3dc6 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -112,6 +112,7 @@ impl ValueRepr { } } +#[derive(Clone)] struct StrLocals { ptr: Variable, len: Variable, @@ -503,8 +504,10 @@ impl Codegen { stmts: &[TirRef], ) -> Result { let saved_locals = ctx.locals.clone(); + let saved_str_locals = ctx.str_locals.clone(); let block_terminated = Self::emit_body(builder, ctx, stmts)?; ctx.locals = saved_locals; + ctx.str_locals = saved_str_locals; Ok(block_terminated) } @@ -597,6 +600,19 @@ impl Codegen { TirTag::IfStmt => Self::generate_if_stmt(builder, ctx, r), TirTag::Assign => { let view = ctx.tir.assign_view(r); + if ctx.str_locals.contains_key(&view.name) { + let repr = Self::eval_inst_str(builder, ctx, view.value)?; + match repr { + ValueRepr::Str { ptr, len, cap } => { + let locals = ctx.str_locals.get(&view.name).unwrap(); + builder.def_var(locals.ptr, ptr); + builder.def_var(locals.len, len); + builder.def_var(locals.cap, cap); + } + _ => unreachable!("str-typed assign should produce ValueRepr::Str"), + } + return Ok(false); + } let val = Self::eval_inst(builder, ctx, view.value)?; let var = ctx.locals.get(&view.name).ok_or_else(|| { format!( diff --git a/src/runtime_lib.rs b/src/runtime_lib.rs index 0cfd87c..13671b4 100644 --- a/src/runtime_lib.rs +++ b/src/runtime_lib.rs @@ -5,11 +5,10 @@ use std::path::PathBuf; const RYO_RUNTIME_LIB: &[u8] = include_bytes!(env!("RYO_RUNTIME_LIB")); -fn cache_dir() -> PathBuf { +fn cache_dir() -> Result { dirs::home_dir() - .expect("cannot determine home directory") - .join(".ryo") - .join("cache") + .ok_or_else(|| io::Error::new(io::ErrorKind::NotFound, "cannot determine home directory")) + .map(|h| h.join(".ryo").join("cache")) } fn content_hash() -> String { @@ -18,7 +17,7 @@ fn content_hash() -> String { } pub fn extract_runtime_to_temp() -> Result { - let dir = cache_dir(); + let dir = cache_dir()?; let hash = content_hash(); let path = dir.join(format!("libryo_runtime-{}.a", hash)); diff --git a/src/sema.rs b/src/sema.rs index 5241344..2d9d08d 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -971,6 +971,10 @@ fn analyze_expr(sema: &mut Sema<'_>, fcx: &mut FuncCtx, scope: &Scope, r: InstRe let receiver_ty = fcx.builder.ty_of(receiver_tir); let method_name = sema.pool.str(view.name).to_string(); + for &arg in &view.args { + analyze_expr(sema, fcx, scope, arg); + } + // For now, only str has methods if sema.pool.kind(receiver_ty) != TypeKind::Str { if !sema.pool.is_error(receiver_ty) { @@ -1396,6 +1400,9 @@ fn emit_builtin_call( return fcx.builder.unreachable(sema.pool.error_type(), span); } let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if sema.pool.is_error(arg_ty) { + return fcx.builder.unreachable(sema.pool.error_type(), span); + } if !matches!(sema.pool.kind(arg_ty), TypeKind::Int) { sema.sink.emit(Diag::error( sema.uir.span(view.args[0]), @@ -1423,6 +1430,9 @@ fn emit_builtin_call( return fcx.builder.unreachable(sema.pool.error_type(), span); } let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if sema.pool.is_error(arg_ty) { + return fcx.builder.unreachable(sema.pool.error_type(), span); + } if !matches!(sema.pool.kind(arg_ty), TypeKind::Float) { sema.sink.emit(Diag::error( sema.uir.span(view.args[0]), @@ -1450,6 +1460,9 @@ fn emit_builtin_call( return fcx.builder.unreachable(sema.pool.error_type(), span); } let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if sema.pool.is_error(arg_ty) { + return fcx.builder.unreachable(sema.pool.error_type(), span); + } if !matches!(sema.pool.kind(arg_ty), TypeKind::Bool) { sema.sink.emit(Diag::error( sema.uir.span(view.args[0]), @@ -1599,6 +1612,9 @@ fn check_print_args( return false; } let arg_ty = fcx.builder.ty_of(arg_tirs[0]); + if sema.pool.is_error(arg_ty) { + return false; + } if !matches!(sema.pool.kind(arg_ty), TypeKind::Str) { sema.sink.emit(Diag::error( sema.uir.span(view.args[0]), From 51608f4f1ea56523cefa8c5895835d2d03ce4cef Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 11:52:39 +0200 Subject: [PATCH 28/33] fix(linker): link libunwind on Linux to resolve _Unwind_* symbols Rust's staticlib bundles precompiled std objects that reference _Unwind_* symbols (from backtrace support) even with panic=abort. On macOS the system provides these; on Linux we must explicitly link against zig's bundled libunwind. --- src/linker.rs | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/src/linker.rs b/src/linker.rs index 41d9f45..b60f9e4 100644 --- a/src/linker.rs +++ b/src/linker.rs @@ -10,14 +10,24 @@ pub(crate) fn link_executable( ) -> Result<(), CompilerError> { let zig_path = toolchain::ensure_zig()?; + let mut args = vec![ + "cc", + "-o", + exe_file, + obj_file, + runtime_lib.to_str().unwrap_or("libryo_runtime.a"), + ]; + + // Rust's staticlib bundles precompiled std objects that reference + // _Unwind_* symbols even with panic=abort (from backtrace support). + // On macOS the system libunwind satisfies them; on Linux we must + // explicitly link zig's bundled libunwind. + if cfg!(target_os = "linux") { + args.push("-lunwind"); + } + let output = Command::new(&zig_path) - .args([ - "cc", - "-o", - exe_file, - obj_file, - runtime_lib.to_str().unwrap_or("libryo_runtime.a"), - ]) + .args(&args) .output() .map_err(|e| CompilerError::LinkError(format!("Failed to run zig cc: {e}")))?; From 903410b270f5244a16435920d6f50f83b5f94c7e Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 11:57:28 +0200 Subject: [PATCH 29/33] docs: add I-043 (runtime no_std migration) and issue tracking convention Add ISSUES.md entry for migrating ryo-runtime to #![no_std] to eliminate the -lunwind workaround on Linux. Document in CLAUDE.md that non-immediate architectural issues should be tracked in ISSUES.md rather than GitHub Issues. --- CLAUDE.md | 8 ++++++++ ISSUES.md | 5 +++++ 2 files changed, 13 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 9354c8f..a402986 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -69,6 +69,14 @@ IMPORTANT: Never author Claude on commits nor PRs. --- +## Issue Tracking + +Non-immediate issues that affect architecture, correctness, or long-term code health go in `ISSUES.md`. Create an entry when you identify a problem that won't be resolved in the current session but must be addressed for better architecture or sustainability. Use the next sequential `I-XXX` number, pick the appropriate severity (Blocking / Correctness / Cleanup), and include Files, Summary, and Resolution fields. + +Do **not** create issues for things you're fixing right now — just fix them. Do **not** use GitHub Issues for these; `ISSUES.md` is the single source of truth. + +--- + ## Design Change Escalation Ryo is pre-alpha. Design changes to the language specification require explicit human approval. Coherence fixes (resolving contradictions, filling documented gaps, tightening phrasing) can proceed as normal work, but anything that adds, removes, or alters a language feature stops for review. diff --git a/ISSUES.md b/ISSUES.md index 3dbc17a..86f0bd3 100644 --- a/ISSUES.md +++ b/ISSUES.md @@ -206,6 +206,11 @@ Resolved entries are removed (not kept around as a changelog). Look at `git log` **Summary:** Currently, `for-range` loops have bespoke code generation that manually emits basic blocks, jump instructions, and raw counter increments. When general iterators are added, loops should be desugared during the AST-to-UIR phase into standard `while` loops that call `.next()`. **Resolution:** Once iterators land, remove the `generate_for_range` codegen entirely and rely on standard `while` codegen to emit loops. +### I-043 — Migrate `ryo-runtime` to `#![no_std]` +**Files:** `runtime/src/lib.rs`, `runtime/Cargo.toml`, `src/linker.rs` +**Summary:** The runtime staticlib only uses `std::alloc`, `std::process::abort()`, and `eprintln!`, yet linking against precompiled `std` bundles objects with `_Unwind_*` symbol references. This forces the linker to pass `-lunwind` on Linux (workaround in `src/linker.rs`). Migrating to `#![no_std]` with `extern crate alloc` eliminates the dependency entirely. +**Resolution:** Replace `std::alloc` with `alloc::alloc` (identical API). Replace `eprintln!` + `process::abort()` with `extern "C" { fn abort() -> !; }`. Add `#[panic_handler]` that aborts. Keep the `rlib` crate-type for `cargo test` via a `#[cfg(test)]` std gate. `ryu` already supports `no_std`. Benefits: smaller archive, faster link times, no hidden unwind dependency, simpler cross-compilation. + --- ## Cross-References From 957d43ed3850a739dbbab04f6c1632f3b740d8fb Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Thu, 21 May 2026 13:45:49 +0200 Subject: [PATCH 30/33] =?UTF-8?q?fix:=20pass=20runtime=20path=20as=20OsStr?= =?UTF-8?q?=20and=20validate=20u64=E2=86=92usize=20in=20str=5Fconcat?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - linker: use Command builder with as_os_str() instead of to_str() with a silent fallback, preserving non-UTF-8 paths verbatim - runtime: validate l_len/r_len/total fit in usize via try_into before pointer arithmetic and copy operations --- runtime/src/lib.rs | 11 +++++++---- src/linker.rs | 16 ++++++---------- 2 files changed, 13 insertions(+), 14 deletions(-) diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index d23d771..82ee206 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -115,12 +115,15 @@ pub unsafe extern "C" fn ryo_str_concat( (*out).cap = 0; return; } + let l_sz: usize = l_len.try_into().unwrap_or_else(|_| oom_abort()); + let r_sz: usize = r_len.try_into().unwrap_or_else(|_| oom_abort()); + let _: usize = total.try_into().unwrap_or_else(|_| oom_abort()); let ptr = ryo_str_alloc(total); - if l_len > 0 { - core::ptr::copy_nonoverlapping(l_ptr, ptr, l_len as usize); + if l_sz > 0 { + core::ptr::copy_nonoverlapping(l_ptr, ptr, l_sz); } - if r_len > 0 { - core::ptr::copy_nonoverlapping(r_ptr, ptr.add(l_len as usize), r_len as usize); + if r_sz > 0 { + core::ptr::copy_nonoverlapping(r_ptr, ptr.add(l_sz), r_sz); } (*out).ptr = ptr; (*out).len = total; diff --git a/src/linker.rs b/src/linker.rs index b60f9e4..f9d4627 100644 --- a/src/linker.rs +++ b/src/linker.rs @@ -1,5 +1,6 @@ use crate::errors::CompilerError; use crate::toolchain; +use std::ffi::OsStr; use std::path::Path; use std::process::Command; @@ -10,24 +11,19 @@ pub(crate) fn link_executable( ) -> Result<(), CompilerError> { let zig_path = toolchain::ensure_zig()?; - let mut args = vec![ - "cc", - "-o", - exe_file, - obj_file, - runtime_lib.to_str().unwrap_or("libryo_runtime.a"), - ]; + let mut cmd = Command::new(&zig_path); + cmd.args(["cc", "-o", exe_file, obj_file]); + cmd.arg(runtime_lib.as_os_str()); // Rust's staticlib bundles precompiled std objects that reference // _Unwind_* symbols even with panic=abort (from backtrace support). // On macOS the system libunwind satisfies them; on Linux we must // explicitly link zig's bundled libunwind. if cfg!(target_os = "linux") { - args.push("-lunwind"); + cmd.arg(OsStr::new("-lunwind")); } - let output = Command::new(&zig_path) - .args(&args) + let output = cmd .output() .map_err(|e| CompilerError::LinkError(format!("Failed to run zig cc: {e}")))?; From 5fa2ed460f2fca500f153fd32c6f83c8013578f6 Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 3 Jun 2026 12:26:11 +0200 Subject: [PATCH 31/33] fix: implement PR #74 review comments & auto-bootstrap runtime - build.rs: automatically compile ryo-runtime on demand respecting active profile and avoiding cargo locks - codegen: gate str Assign path on sema-resolved type to support shadowing - sema/tir: simplify TIR by removing StrIsEmpty and lowering s.is_empty() to StrLen == 0 in sema - runtime: add // SAFETY: comments to unsafe blocks in runtime/src/lib.rs - cleanup: remove redundant justfile and references, standard cargo works out-of-the-box --- CLAUDE.md | 12 ++++---- Cargo.toml | 2 +- build.rs | 58 ++++++++++++++++++++++++++++++++++---- justfile | 24 ---------------- runtime/src/lib.rs | 18 +++++++++++- src/codegen.rs | 36 +++++++++-------------- src/runtime_lib.rs | 4 +-- src/sema.rs | 12 ++++++-- src/tir.rs | 5 +--- tests/integration_tests.rs | 16 +++++++++++ 10 files changed, 118 insertions(+), 69 deletions(-) delete mode 100644 justfile diff --git a/CLAUDE.md b/CLAUDE.md index a402986..50959de 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -36,16 +36,18 @@ All Ryo code examples **must** use Python-style colons and indentation, **NOT** ## Build & Test Commands +Standard cargo commands work fully out-of-the-box (even on a clean checkout) because `build.rs` automatically compiles the `ryo-runtime` static library in a separate target directory if it isn't found. + ```bash -cargo fmt # Auto-format (CI runs --check with -Dwarnings) -cargo clippy --all-targets # Lint (CI enforces, warnings are errors) -cargo check # Check for errors -cargo build [--release] # Build debug or release +cargo build # Automatically builds the runtime (if missing) and then compiles the compiler +cargo check # Check compiler for errors +cargo test # Run all unit + integration tests cargo run -- run # JIT compile and execute cargo run -- build # AOT compile to binary -cargo test # Run tests cargo run -- toolchain install # Download Zig linker cargo run -- toolchain status # Check Zig status +cargo clippy --all-targets # Lint (warnings are errors) +cargo fmt --check # Check code formatting style ``` **File extensions:** `.ryo` (source), `.md` (docs), `.rs` (Rust), `.o`/`.obj` (generated) diff --git a/Cargo.toml b/Cargo.toml index 834ddc9..3ec916f 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -17,6 +17,7 @@ version = "0.1.0" edition = "2024" [build-dependencies] +sha2 = "0.10" [dependencies] ryo-runtime = { path = "runtime" } @@ -40,7 +41,6 @@ tar = "0.4" target-lexicon = "0.13.5" ureq = "3" dirs = "6" -sha2 = "0.10" [dev-dependencies] tempfile = "3.27" diff --git a/build.rs b/build.rs index 397a59e..e62482f 100644 --- a/build.rs +++ b/build.rs @@ -1,3 +1,4 @@ +use sha2::{Digest, Sha256}; use std::env; fn main() { @@ -15,28 +16,73 @@ fn main() { }; println!("cargo:rustc-env=RYO_VERSION={version}"); - // Runtime archive path — set by `just build` or default location. + // Runtime archive path. Honor RYO_RUNTIME_LIB if set (used by downstream + // packagers). Otherwise build it on demand using the current cargo profile + // in a separate target directory to avoid cargo lock deadlocks. let runtime_path = env::var("RYO_RUNTIME_LIB").unwrap_or_else(|_| { let manifest_dir = env::var("CARGO_MANIFEST_DIR").unwrap(); let target_dir = env::var("CARGO_TARGET_DIR").unwrap_or_else(|_| format!("{manifest_dir}/target")); - let path = std::path::PathBuf::from(&target_dir) - .join("release") + let raw_profile = env::var("PROFILE").unwrap_or_else(|_| "debug".to_string()); + let profile = match raw_profile.as_str() { + "release" | "production" | "prod" => "release", + "debug" | "dev" => "debug", + _ => { + let opt_level = env::var("OPT_LEVEL").unwrap_or_else(|_| "0".to_string()); + if opt_level != "0" { + "release" + } else { + "debug" + } + } + }; + let mut path = std::path::PathBuf::from(&target_dir) + .join(&profile) .join("libryo_runtime.a"); + if !path.exists() { + // Build the runtime archive in-process in a separate target directory to avoid deadlocks. + let custom_target_dir = std::path::PathBuf::from(&manifest_dir).join("target/runtime-build"); + let cargo = env::var("CARGO").unwrap_or_else(|_| "cargo".to_string()); + let mut cmd = std::process::Command::new(&cargo); + cmd.arg("build") + .arg("-p") + .arg("ryo-runtime") + .arg("--target-dir") + .arg(custom_target_dir.to_str().unwrap()); + if profile == "release" { + cmd.arg("--release"); + } + let status = cmd + .status() + .unwrap_or_else(|e| panic!("failed to spawn `cargo build -p ryo-runtime`: {e}")); + if status.success() { + path = custom_target_dir.join(&profile).join("libryo_runtime.a"); + } + } if !path.exists() { panic!( - "\n\nlibryo_runtime.a not found at {}\n\ - Run `just build` or `cargo build -p ryo-runtime --release` first.\n\n", + "libryo_runtime.a still missing at {} after build attempt", path.display() ); } - path.to_str().unwrap().to_string() + path.to_str() + .expect("RYO_RUNTIME_LIB path is non-UTF-8; set RYO_RUNTIME_LIB explicitly") + .to_string() }); println!("cargo:rustc-env=RYO_RUNTIME_LIB={runtime_path}"); println!("cargo:rerun-if-env-changed=RYO_RUNTIME_LIB"); println!("cargo:rerun-if-changed={runtime_path}"); + let runtime_bytes = std::fs::read(&runtime_path).unwrap_or_else(|e| { + panic!("failed to read runtime lib at {}: {}", runtime_path, e); + }); + let mut hasher = Sha256::new(); + hasher.update(&runtime_bytes); + let hash_result = hasher.finalize(); + let hash_string = format!("{:x}", hash_result); + println!("cargo:rustc-env=RYO_RUNTIME_HASH={hash_string}"); + let manifest_dir = env::var("CARGO_MANIFEST_DIR").unwrap(); let runtime_src = std::path::PathBuf::from(&manifest_dir).join("runtime/src"); println!("cargo:rerun-if-changed={}", runtime_src.display()); diff --git a/justfile b/justfile deleted file mode 100644 index 1bc8ef9..0000000 --- a/justfile +++ /dev/null @@ -1,24 +0,0 @@ -# Default recipe: build runtime then compiler -build: - cargo build -p ryo-runtime --release - cargo build - -# Release build -build-release: - cargo build -p ryo-runtime --release - cargo build --release - -# Run all tests (builds runtime first) -test: - cargo build -p ryo-runtime --release - cargo test - -# Run clippy on everything -lint: - cargo clippy -p ryo-runtime --all-targets - cargo build -p ryo-runtime --release - cargo clippy --all-targets - -# Format check -fmt: - cargo fmt --check diff --git a/runtime/src/lib.rs b/runtime/src/lib.rs index 82ee206..ae00762 100644 --- a/runtime/src/lib.rs +++ b/runtime/src/lib.rs @@ -13,6 +13,7 @@ pub extern "C" fn ryo_str_alloc(cap: u64) -> *mut u8 { return std::ptr::null_mut(); } let layout = layout_for(cap); + // SAFETY: layout has nonzero size (cap > 0 checked above) and align 1 is valid for u8. let ptr = unsafe { alloc(layout) }; if ptr.is_null() { oom_abort(); @@ -29,6 +30,7 @@ pub unsafe extern "C" fn ryo_str_free(ptr: *mut u8, cap: u64) { return; } let layout = layout_for(cap); + // SAFETY: caller contract — ptr came from ryo_str_alloc/realloc with this exact cap. unsafe { dealloc(ptr, layout) }; } @@ -41,10 +43,12 @@ pub unsafe extern "C" fn ryo_str_realloc(ptr: *mut u8, old_cap: u64, new_cap: u6 return ryo_str_alloc(new_cap); } if new_cap == 0 { + // SAFETY: ptr/old_cap came from a prior alloc per our # Safety doc. unsafe { ryo_str_free(ptr, old_cap) }; return std::ptr::null_mut(); } let layout = layout_for(old_cap); + // SAFETY: ptr/old_cap pair from prior alloc; new_cap > 0 checked above; layout matches old_cap. let new_ptr = unsafe { realloc(ptr, layout, new_cap as usize) }; if new_ptr.is_null() { oom_abort(); @@ -58,6 +62,7 @@ pub unsafe extern "C" fn ryo_str_realloc(ptr: *mut u8, old_cap: u64, new_cap: u6 /// `out` must point to a valid `RyoStrFat`. unsafe fn write_str_result(s: &[u8], out: *mut RyoStrFat) { let ptr = ryo_str_alloc(s.len() as u64); + // SAFETY: both ptr and s.as_ptr() are valid for s.len() bytes, and copy_nonoverlapping is safe. unsafe { core::ptr::copy_nonoverlapping(s.as_ptr(), ptr, s.len()); (*out).ptr = ptr; @@ -79,6 +84,7 @@ fn oom_abort() -> ! { /// `data` must point to `len` readable bytes. `out` must point to a valid `RyoStrFat`. #[unsafe(no_mangle)] pub unsafe extern "C" fn ryo_str_from_literal(data: *const u8, len: u64, out: *mut RyoStrFat) { + // SAFETY: caller contract — out points to a valid RyoStrFat. unsafe { if len == 0 { (*out).ptr = std::ptr::null_mut(); @@ -104,6 +110,7 @@ pub unsafe extern "C" fn ryo_str_concat( r_len: u64, out: *mut RyoStrFat, ) { + // SAFETY: caller contract — out points to a valid RyoStrFat and input buffers are valid for reading. unsafe { let total = match l_len.checked_add(r_len) { Some(t) => t, @@ -120,9 +127,11 @@ pub unsafe extern "C" fn ryo_str_concat( let _: usize = total.try_into().unwrap_or_else(|_| oom_abort()); let ptr = ryo_str_alloc(total); if l_sz > 0 { + debug_assert!(!l_ptr.is_null()); core::ptr::copy_nonoverlapping(l_ptr, ptr, l_sz); } if r_sz > 0 { + debug_assert!(!r_ptr.is_null()); core::ptr::copy_nonoverlapping(r_ptr, ptr.add(l_sz), r_sz); } (*out).ptr = ptr; @@ -147,6 +156,7 @@ pub unsafe extern "C" fn ryo_str_eq( if a_len == 0 { return 1; } + // SAFETY: caller contract — a_ptr/a_len and b_ptr/b_len describe valid byte ranges. let a_slice = unsafe { core::slice::from_raw_parts(a_ptr, a_len as usize) }; let b_slice = unsafe { core::slice::from_raw_parts(b_ptr, b_len as usize) }; if a_slice == b_slice { 1 } else { 0 } @@ -156,7 +166,7 @@ pub unsafe extern "C" fn ryo_str_eq( /// `out` must point to a valid `RyoStrFat`. #[unsafe(no_mangle)] pub unsafe extern "C" fn ryo_int_to_str(value: i64, out: *mut RyoStrFat) { - let mut buf = [0u8; 20]; + let mut buf = [0u8; 32]; let negative = value < 0; // Work with unsigned magnitude to handle i64::MIN correctly // (i64::MIN.wrapping_neg() overflows back to i64::MIN). @@ -182,6 +192,7 @@ pub unsafe extern "C" fn ryo_int_to_str(value: i64, out: *mut RyoStrFat) { } let len = (buf.len() - pos) as u64; let ptr = ryo_str_alloc(len); + // SAFETY: ptr is newly allocated for len bytes, out is valid to write. unsafe { core::ptr::copy_nonoverlapping(buf.as_ptr().add(pos), ptr, len as usize); (*out).ptr = ptr; @@ -195,12 +206,15 @@ pub unsafe extern "C" fn ryo_int_to_str(value: i64, out: *mut RyoStrFat) { #[unsafe(no_mangle)] pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { if value.is_nan() { + // SAFETY: write_str_result safely writes Nan string through out. return unsafe { write_str_result(b"nan", out) }; } if value.is_infinite() { if value < 0.0 { + // SAFETY: write_str_result safely writes -inf string through out. return unsafe { write_str_result(b"-inf", out) }; } else { + // SAFETY: write_str_result safely writes inf string through out. return unsafe { write_str_result(b"inf", out) }; } } @@ -210,6 +224,7 @@ pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { let bytes = s.as_bytes(); let len = bytes.len() as u64; let ptr = ryo_str_alloc(len); + // SAFETY: ptr is newly allocated for len bytes, out is valid to write. unsafe { core::ptr::copy_nonoverlapping(bytes.as_ptr(), ptr, len as usize); (*out).ptr = ptr; @@ -223,6 +238,7 @@ pub unsafe extern "C" fn ryo_float_to_str(value: f64, out: *mut RyoStrFat) { #[unsafe(no_mangle)] pub unsafe extern "C" fn ryo_bool_to_str(value: u8, out: *mut RyoStrFat) { let s: &[u8] = if value != 0 { b"true" } else { b"false" }; + // SAFETY: write_str_result safely writes through out. unsafe { write_str_result(s, out) }; } diff --git a/src/codegen.rs b/src/codegen.rs index b3f3dc6..c3801ae 100644 --- a/src/codegen.rs +++ b/src/codegen.rs @@ -600,17 +600,20 @@ impl Codegen { TirTag::IfStmt => Self::generate_if_stmt(builder, ctx, r), TirTag::Assign => { let view = ctx.tir.assign_view(r); - if ctx.str_locals.contains_key(&view.name) { + if is_str_type(inst.ty, ctx.pool) { let repr = Self::eval_inst_str(builder, ctx, view.value)?; - match repr { - ValueRepr::Str { ptr, len, cap } => { - let locals = ctx.str_locals.get(&view.name).unwrap(); - builder.def_var(locals.ptr, ptr); - builder.def_var(locals.len, len); - builder.def_var(locals.cap, cap); - } - _ => unreachable!("str-typed assign should produce ValueRepr::Str"), - } + let ValueRepr::Str { ptr, len, cap } = repr else { + unreachable!("str-typed assign should produce ValueRepr::Str"); + }; + let locals = ctx.str_locals.get(&view.name).ok_or_else(|| { + format!( + "Undefined string variable in assign: '{}'", + ctx.pool.str(view.name) + ) + })?; + builder.def_var(locals.ptr, ptr); + builder.def_var(locals.len, len); + builder.def_var(locals.cap, cap); return Ok(false); } let val = Self::eval_inst(builder, ctx, view.value)?; @@ -1090,19 +1093,6 @@ impl Codegen { _ => unreachable!("StrLen operand must produce ValueRepr::Str"), } } - TirTag::StrIsEmpty => { - let operand = match inst.data { - TirData::UnOp(r) => r, - _ => unreachable!("StrIsEmpty must carry TirData::UnOp"), - }; - let repr = Self::eval_inst_str(builder, ctx, operand)?; - let len_val = match repr { - ValueRepr::Str { len, .. } => len, - _ => unreachable!("StrIsEmpty operand must produce ValueRepr::Str"), - }; - let zero = builder.ins().iconst(types::I64, 0); - builder.ins().icmp(IntCC::Equal, len_val, zero) - } TirTag::StrCmpEq | TirTag::StrCmpNe => { let (lhs, rhs) = match inst.data { TirData::BinOp { lhs, rhs } => (lhs, rhs), diff --git a/src/runtime_lib.rs b/src/runtime_lib.rs index 13671b4..f328e53 100644 --- a/src/runtime_lib.rs +++ b/src/runtime_lib.rs @@ -1,4 +1,3 @@ -use sha2::{Digest, Sha256}; use std::fs; use std::io; use std::path::PathBuf; @@ -12,8 +11,7 @@ fn cache_dir() -> Result { } fn content_hash() -> String { - let hash = Sha256::digest(RYO_RUNTIME_LIB); - format!("{:x}", hash)[..16].to_string() + env!("RYO_RUNTIME_HASH")[..16].to_string() } pub fn extract_runtime_to_temp() -> Result { diff --git a/src/sema.rs b/src/sema.rs index 2d9d08d..3dcf964 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -1013,10 +1013,18 @@ fn analyze_expr(sema: &mut Sema<'_>, fcx: &mut FuncCtx, scope: &Scope, r: InstRe )); return fcx.builder.unreachable(sema.pool.error_type(), span); } - fcx.builder.push_typed( - TirTag::StrIsEmpty, + let len_tir = fcx.builder.push_typed( + TirTag::StrLen, TirData::UnOp(receiver_tir), + sema.pool.int(), + span, + ); + let zero = fcx.builder.int_const(0, sema.pool.int(), span); + fcx.builder.binary( + TirTag::ICmpEq, sema.pool.bool_(), + len_tir, + zero, span, ) } diff --git a/src/tir.rs b/src/tir.rs index 98a8047..e7e4696 100644 --- a/src/tir.rs +++ b/src/tir.rs @@ -144,8 +144,6 @@ pub enum TirTag { /// Read the `len` field of a str fat pointer. Operand in `TirData::UnOp`. StrLen, - /// `str.is_empty()` — compares len to 0. Operand in `TirData::UnOp`. - StrIsEmpty, // Float arithmetic / comparison. FAdd, @@ -515,7 +513,7 @@ impl TirBuilder { /// General-purpose instruction emit for tags that don't fit the /// `unary` / `binary` debug-assert gates. Sema uses this for - /// method-call lowerings like `StrLen` / `StrIsEmpty`. + /// method-call lowerings like `StrLen`. pub fn push_typed(&mut self, tag: TirTag, data: TirData, ty: TypeId, span: Span) -> TirRef { self.push(tag, ty, data, span) } @@ -1121,7 +1119,6 @@ fn un_op_name(t: TirTag) -> &'static str { TirTag::Return => "ret", TirTag::ExprStmt => "expr_stmt", TirTag::StrLen => "str_len", - TirTag::StrIsEmpty => "str_is_empty", _ => "?un", } } diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index 2d05061..f22ec2e 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -2255,3 +2255,19 @@ fn test_str_returned_from_function() { String::from_utf8_lossy(&output.stderr) ); } + +#[test] +fn test_str_shadowed_by_int_assignment_does_not_panic() { + let temp_dir = TempDir::new().expect("Failed to create temp directory"); + let code = "fn main():\n\tmut s: str = \"hello\"\n\tif true:\n\t\tmut s: int = 1\n\t\ts = 2\n\t\tprint(int_to_str(s))\n\tprint(s)\n"; + let test_file = create_test_file(temp_dir.path(), "str_shadow.ryo", code); + let output = run_ryo_command(&["run", "str_shadow.ryo"], &test_file).expect("Failed"); + assert!( + output.status.success(), + "STDERR: {}", + String::from_utf8_lossy(&output.stderr) + ); + let stdout = String::from_utf8_lossy(&output.stdout); + assert!(stdout.contains("2"), "Output should contain '2', got: {}", stdout); + assert!(stdout.contains("hello"), "Output should contain 'hello', got: {}", stdout); +} From eaabab16b5c63ffb17220d34ad90376cc0cfc88f Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 3 Jun 2026 12:28:35 +0200 Subject: [PATCH 32/33] fix: resolve clippy warnings in build.rs and format workspace - build.rs: remove needless borrows for generic args (profile) flagged by clippy - formatting: run cargo fmt across sema and integration tests --- build.rs | 13 +++++-------- src/sema.rs | 9 ++------- tests/integration_tests.rs | 12 ++++++++++-- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/build.rs b/build.rs index e62482f..b8bf0e2 100644 --- a/build.rs +++ b/build.rs @@ -29,19 +29,16 @@ fn main() { "debug" | "dev" => "debug", _ => { let opt_level = env::var("OPT_LEVEL").unwrap_or_else(|_| "0".to_string()); - if opt_level != "0" { - "release" - } else { - "debug" - } + if opt_level != "0" { "release" } else { "debug" } } }; let mut path = std::path::PathBuf::from(&target_dir) - .join(&profile) + .join(profile) .join("libryo_runtime.a"); if !path.exists() { // Build the runtime archive in-process in a separate target directory to avoid deadlocks. - let custom_target_dir = std::path::PathBuf::from(&manifest_dir).join("target/runtime-build"); + let custom_target_dir = + std::path::PathBuf::from(&manifest_dir).join("target/runtime-build"); let cargo = env::var("CARGO").unwrap_or_else(|_| "cargo".to_string()); let mut cmd = std::process::Command::new(&cargo); cmd.arg("build") @@ -56,7 +53,7 @@ fn main() { .status() .unwrap_or_else(|e| panic!("failed to spawn `cargo build -p ryo-runtime`: {e}")); if status.success() { - path = custom_target_dir.join(&profile).join("libryo_runtime.a"); + path = custom_target_dir.join(profile).join("libryo_runtime.a"); } } if !path.exists() { diff --git a/src/sema.rs b/src/sema.rs index 3dcf964..6ad4edc 100644 --- a/src/sema.rs +++ b/src/sema.rs @@ -1020,13 +1020,8 @@ fn analyze_expr(sema: &mut Sema<'_>, fcx: &mut FuncCtx, scope: &Scope, r: InstRe span, ); let zero = fcx.builder.int_const(0, sema.pool.int(), span); - fcx.builder.binary( - TirTag::ICmpEq, - sema.pool.bool_(), - len_tir, - zero, - span, - ) + fcx.builder + .binary(TirTag::ICmpEq, sema.pool.bool_(), len_tir, zero, span) } _ => { sema.sink.emit(Diag::error( diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index f22ec2e..07cf35f 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -2268,6 +2268,14 @@ fn test_str_shadowed_by_int_assignment_does_not_panic() { String::from_utf8_lossy(&output.stderr) ); let stdout = String::from_utf8_lossy(&output.stdout); - assert!(stdout.contains("2"), "Output should contain '2', got: {}", stdout); - assert!(stdout.contains("hello"), "Output should contain 'hello', got: {}", stdout); + assert!( + stdout.contains("2"), + "Output should contain '2', got: {}", + stdout + ); + assert!( + stdout.contains("hello"), + "Output should contain 'hello', got: {}", + stdout + ); } From 488487c515c270cc7ac60784f900ea364e3ae07f Mon Sep 17 00:00:00 2001 From: Pepe Navarro Date: Wed, 3 Jun 2026 12:38:03 +0200 Subject: [PATCH 33/33] fix(build): avoid string panics in build.rs & document profile logic - Pass custom_target_dir as reference directly to Command arg, avoiding Path to String conversions and unwraps - Handle non-UTF-8 path resolution in final env-export logic safely with clear panic message - Add comprehensive inline documentation for PROFILE-to-OPT_LEVEL mapping rules and custom optimized profiles fallback --- build.rs | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/build.rs b/build.rs index b8bf0e2..0090267 100644 --- a/build.rs +++ b/build.rs @@ -24,6 +24,14 @@ fn main() { let target_dir = env::var("CARGO_TARGET_DIR").unwrap_or_else(|_| format!("{manifest_dir}/target")); let raw_profile = env::var("PROFILE").unwrap_or_else(|_| "debug".to_string()); + // Mapping rules for Cargo profile resolution: + // - Known "release", "production", and "prod" profiles map to "release". + // - Known "debug" and "dev" profiles map to "debug". + // - For unrecognized/custom profiles, we consult OPT_LEVEL and treat any + // non-"0" optimization level as "release" (since custom optimized profiles + // typically build under optimized target layouts). + // NOTE: Custom profiles with debug = true but OPT_LEVEL > 0 (e.g., opt-level = 1, 2, 3) + // will be classified as "release", avoiding build directory mismatch surprises. let profile = match raw_profile.as_str() { "release" | "production" | "prod" => "release", "debug" | "dev" => "debug", @@ -45,7 +53,7 @@ fn main() { .arg("-p") .arg("ryo-runtime") .arg("--target-dir") - .arg(custom_target_dir.to_str().unwrap()); + .arg(&custom_target_dir); if profile == "release" { cmd.arg("--release"); } @@ -62,9 +70,17 @@ fn main() { path.display() ); } - path.to_str() - .expect("RYO_RUNTIME_LIB path is non-UTF-8; set RYO_RUNTIME_LIB explicitly") - .to_string() + // Safely check if path contains non-UTF-8 characters, providing clear instructions if so. + match path.to_str() { + Some(s) => s.to_string(), + None => { + panic!( + "The resolved runtime library path at '{}' contains non-UTF-8 characters. \ + Please set the RYO_RUNTIME_LIB environment variable explicitly to override it.", + path.display() + ); + } + } }); println!("cargo:rustc-env=RYO_RUNTIME_LIB={runtime_path}");