test(DON'T MERGE): develop-v2.1.0 rv64 by shuklaayush · Pull Request #2719 · openvm-org/openvm

shuklaayush · 2026-04-20T17:21:05Z

No description provided.

github-actions · 2026-04-20T17:21:28Z

Claude finished @shuklaayush's task in 5m 13s —— View job

Code Review: RV64 Support PR

Gather context and understand PR scope
Review new decoder crate
Review toolchain changes (platform, openvm, transpiler)
Review RV64 circuit/extension changes
Review test changes
Review Cargo.toml / dependency changes
Post final review

Overview

This PR adds RV64IM support to OpenVM across 302 files. The major changes are:

New openvm-decoder crate — RISC-V instruction decoder forked from rrs-lib, extended with RV64 instructions (LD/SD/LWU, *W-suffix ops, M-extension W ops)
Rename rv32im → riscv — The extension crate moves from extensions/rv32im/ to extensions/riscv/
64-bit register width — Registers go from 4 limbs (32-bit) to 8 limbs (64-bit), adapters and chips updated accordingly
New *_W chips — base_alu_w, shift_w, mul_w, divrem_w for RV64 word-width (32-bit) operations with sign-extension
Toolchain updates — Platform, openvm, serializer, transpiler all updated for 64-bit words
New transpiler tests — RV64 ELF decode and transpile tests with hand-written assembly programs

Bugs

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

extensions/riscv/circuit/src/lib.rs:299 — The Rv64ImGpuBuilder::create_chip_complex references config.rv32i, but the field was renamed to rv64i on line 103. This will fail to compile when the cuda feature is enabled.

Same issue in test files:

extensions/riscv/circuit/src/mul/tests.rs:289
extensions/riscv/circuit/src/mulh/tests.rs:530
extensions/riscv/circuit/src/jalr/tests.rs:586
extensions/riscv/circuit/src/mul_w/tests.rs:419

Fix this →

2. Hint stream length encoding mismatch (Semantic bug)

extensions/riscv/circuit/src/extension/mod.rs:988-996 — Rv32HintInputSubEx writes the hint length as 4 bytes (u32::to_le_bytes()) and pads data to 4-byte alignment. But on the guest side (crates/toolchain/openvm/src/io/mod.rs:39-46), read_u64() consumes 8 bytes from the hint stream. The host emulation in host.rs:47 correctly writes 8 bytes (u64::to_le_bytes()).

In zkVM execution, read_u64() would read the 4-byte length + the first 4 bytes of actual data as a single u64, corrupting both the length and the data stream.

Fix: Change line 989 to (hint.len() as u64).to_le_bytes() and line 995 to .div_ceil(8) * 8 for 8-byte alignment.

Fix this →

3. CUDA extension not ported (Compilation error under `cuda` feature)

extensions/riscv/circuit/src/extension/cuda.rs still references all Rv32* types (Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, etc.) which no longer exist in this crate. The entire file will fail to compile when cuda feature is enabled. It also doesn't register the new *_W chips.

This is likely intentional for this development branch (given it's marked "DON'T MERGE"), but should be tracked.

Code Quality Issues

4. Phantom sub-executor naming inconsistency

Rv32HintInputSubEx, Rv32HintRandomSubEx, Rv32PrintStrSubEx — These are used by the RV64 extension but still carry the Rv32 prefix. They should be renamed to Rv64* or a generic name (e.g., HintInputSubEx) since they now operate on RV64 register widths (note: Rv32HintRandomSubEx calls read_rv64_register).

5. AoT code uses `RV32_REGISTER_AS` / `RV32_MEMORY_AS`

extensions/riscv/circuit/src/common/mod.rs:16,107-116 — The AoT assembly generation code references RV32_REGISTER_AS and RV32_MEMORY_AS. These happen to have the same numeric values as their RV64_* counterparts (both are 1 and 2), so this is functionally correct but misleading.

6. Duplicate `openvm-riscv-guest` Cargo.toml

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The workspace correctly points to the outer one, but the inner directory (guest/guest/) appears to be a leftover from the RV32 → RV64 migration. The inner guest/guest/ uses the old cfg(not(target_os = "zkvm")) guard while the outer uses the correct cfg(not(any(target_arch = "riscv64"))). Consider removing the inner directory to avoid confusion.

7. Commented-out hintstore module

extensions/riscv/circuit/src/lib.rs:29 — The hintstore module is commented out (// mod hintstore;), and the Rv64Io extension has empty trait implementations. This means hint store operations (writing multiple words from the hint stream) won't work in the zkVM. This is presumably known, but it blocks end-to-end guest program execution since read_vec() depends on hint_buffer_chunked → hint_buffer_u64! → HINT_FUNCT3 opcode which needs hintstore.

Architecture Observations

The adapter/core pattern is well maintained through the migration. New *_W adapters (Rv64BaseAluWAdapter, Rv64MultWAdapter) correctly expose only the lower 32-bit limbs to core AIRs while handling full-width register reads/writes with sign extension.
The imm_to_u64 and imm_to_bytes functions in adapters/mod.rs properly handle 12-bit signed immediate sign-extension to 64 bits via the 24-bit encoding.
The new decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6 instead of RV32's 5-bit funct7), and correctly validates shamt < 32 for *W shift instructions.
The from_load and from_s_type transpiler utilities correctly encode load/store instructions with the 16-bit immediate format needed by the loadstore adapter.
Test coverage for the new chips is thorough — each *_W module has extensive tests covering edge cases (sign extension, overflow, boundary values).

Summary

The core RV64 architecture changes are solid — adapters, chips, and transpiler are well-structured. The main actionable items are:

Fix config.rv32i → config.rv64i in 5 locations (blocks compilation of tests and CUDA builds)
Fix hint stream length encoding to use u64 instead of u32 (blocks correct zkVM execution of IO)
Track CUDA port as a separate work item

github-actions · 2026-04-20T18:55:57Z

Claude finished @shuklaayush's task in 4m 48s —— View job

Code Review: RV64 Support PR (Re-review after sync)

Gather context — check new commits since last review
Review fixes for previously reported bugs
Review decoder crate
Review toolchain changes (platform, openvm, transpiler, IO)
Review RV64 circuit/extension changes
Post final review

Overview

Re-reviewing after the latest fix: preserve rc.1 evolution + apply rv64-specific transformations commit. The PR is at 304 files changed, 15505 insertions, 6686 deletions across 14 commits.

Previous Bug #2 (hint stream length encoding) was fixed — now correctly uses u64::to_le_bytes() and 8-byte alignment. Previous Bug #4 (phantom sub-executor naming) was also fixed — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx. Previous Bug #7 (commented-out hintstore) was also fixed — hintstore module is now active.

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` and `aot` features)

The Rv64ImConfig struct has field rv64i (line 103), but 5 locations still reference config.rv32i:

extensions/riscv/circuit/src/lib.rs:299 — GPU builder
extensions/riscv/circuit/src/mul/tests.rs:289
extensions/riscv/circuit/src/mulh/tests.rs:530
extensions/riscv/circuit/src/jalr/tests.rs:586
extensions/riscv/circuit/src/mul_w/tests.rs:419

Fix this →

2. AOT code uses `RV32_REGISTER_NUM_LIMBS` (4) instead of `RV64_REGISTER_NUM_LIMBS` (8) (Dead trait impls)

The AOT trait implementations in load_sign_extend/aot.rs are written for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (lines 21, 41), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:25. This means the AOT impls are for a type that is never instantiated. The same pattern may exist in other AOT files.

3. AOT tests read only 4 bytes from 8-byte registers

Multiple AOT test helpers use state.memory.read::<u8, 4>(RV32_REGISTER_AS, offset):

mul/tests.rs:316
mulh/tests.rs:601
jalr/tests.rs:618
mul_w/tests.rs:446

RV64 registers are 8 bytes wide. Reading only 4 bytes would silently truncate values, giving incorrect test results for any value that uses the upper 32 bits. Should be read::<u8, 8> for full-width ops (mul, mulh, jalr). For mul_w, reading 4 bytes might be intentional since *W ops produce 32-bit results sign-extended to 64 bits, but even then the sign-extended upper bytes should be verified.

4. CUDA extension not ported (Compilation error under `cuda` feature)

extensions/riscv/circuit/src/extension/cuda.rs still references old Rv32* types (Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc.) which no longer exist. All CUDA chip files (*/cuda.rs) also use RV32_REGISTER_NUM_LIMBS and RV32_CELL_BITS for core AIR parameterization — many of these need to be updated to RV64_* for non-W chips.

This is likely intentional for this development branch (marked "DON'T MERGE").

5. `a_reg = a / 4` hardcodes 4-limb register stride in AOT code

load_sign_extend/aot.rs:115 uses a / 4 and b / 4 to convert byte offsets to register indices. With RV64's 8-byte registers, this should be a / 8 and b / 8. The gpr_to_rv32_register function in common/mod.rs:25 also hardcodes a 4-limb/2-limb-per-xmm scheme. The entire AOT layer appears to assume 32-bit registers.

Code Quality Issues

6. `RV32_REGISTER_AS` / `RV32_MEMORY_AS` in non-AOT code

While these constants happen to have the same numeric values as RV64_REGISTER_AS / RV64_MEMORY_AS (both 1 and 2), the following locations use the RV32 names in RV64 code, which is misleading:

divrem/execution.rs:117 — if d.as_canonical_u32() != RV32_REGISTER_AS
auipc/execution.rs:102 — if d as u32 != RV32_REGISTER_AS
branch_eq/execution.rs:124 — if d.as_canonical_u32() != RV32_REGISTER_AS
loadstore/aot.rs:133,162
shift_w/execution.rs:250,252,255
base_alu_w/execution.rs:251,253,256
mul_w/execution.rs:207,209,211
divrem_w/execution.rs:116,292,294,296

The *_W execution files' AOT code uses RV32_REGISTER_AS — functionally correct but inconsistent with the rest of the codebase that uses RV64_REGISTER_AS.

7. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ still uses hint_buffer_u32! and 4-byte word operations (RV32), while the outer uses hint_buffer_u64! and 8-byte dwords (RV64). The workspace correctly points to the outer one, but the inner directory is a confusing leftover.

8. ELF `memory_image` stores data in 4-byte chunks

The ELF loader uses BTreeMap<u32, u32> for the memory image and iterates in 4-byte steps (line 199 of elf.rs). This works correctly since RV64 8-byte values span two consecutive 4-byte entries, and elf_memory_image_to_openvm_memory_image properly converts to byte-level entries. However, a comment noting this design choice would help future readers.

Architecture Observations (Positive)

Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6) and W-suffix instruction validation (shamt < 32) are correctly implemented.
Phantom sub-executors properly updated for RV64 — read_rv64_register, 8-byte hint stream alignment, u64::to_le_bytes() for hint lengths.
Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.
Execution code for non-AOT paths (base_alu, shift, less_than, branch_eq, branch_lt) correctly uses vm_read::<u8, 8> and RV64_REGISTER_AS.
*_W adapter/core pattern is well-designed — adapters handle full 8-byte register reads/writes while cores operate on the lower 4-byte word with proper sign extension.
Test coverage for new RV64 transpiler tests is good — hand-written assembly programs (rv64im-stress.S, rv64im-intrin.S) cover arithmetic, shifts, branches, loads/stores, and M-extension operations.

Summary

Priority	Issue	Impact
High	#1: `config.rv32i` → `config.rv64i`	Blocks compilation under `cuda`/`aot` features
High	#2-5: AOT layer still assumes 32-bit registers	Entire AOT code path is broken for RV64
Medium	#4: CUDA extension not ported	Blocks GPU proving (likely tracked separately)
Low	#6: `RV32_*_AS` naming in execution code	Cosmetic, functionally correct
Low	#7: Inner `guest/guest/` directory	Cleanup needed, no runtime impact

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The main execution, circuit, and transpiler changes are well-structured. The primary risk is the AOT layer, which appears to still assume 4-limb (32-bit) registers throughout.

github-actions · 2026-04-20T19:35:27Z

Claude finished @shuklaayush's task in 4m 47s —— View job

Code Review: RV64 Support PR (Re-review after sync)

Gather context — check new commits since last review
Review previously reported bugs — verify status
Review new decoder crate
Review toolchain changes (platform, openvm, transpiler, IO)
Review RV64 circuit/extension changes
Review test changes and CI workflows
Post final review

Overview

Re-reviewing after the latest sync. The PR is at 313 files changed, 15544 insertions, 6823 deletions across 15 commits. The new commit chore(ci): comment out aot workflows acknowledges the AOT layer is not yet ported.

Previously fixed issues:

Bug feat: add preprocessing support #2 from first review (hint stream length encoding) — Fixed: now uses u64::to_le_bytes() and 8-byte alignment (extension/mod.rs:866-876)
Bug test: added an integration test, fib_triples #4 from first review (phantom sub-executor naming) — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Bug chore: adding (pk, vk) to triples_fib integration test #7 from first review (commented-out hintstore) — Fixed: hintstore module is now active

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

Rv64ImConfig has field rv64i (line 103 of lib.rs), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	299
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	419

All are behind #[cfg(feature = "aot")] or #[cfg(feature = "cuda")], so they don't break default builds, but will fail with those features enabled.

Fix this →

2. AOT `load_sign_extend` impl is dead code — wrong const generic

load_sign_extend/aot.rs:20-21 implements AotExecutor for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. The AOT impls (lines 20 and 41) will never be dispatched for any RV64 LoadSignExtendExecutor.

Same pattern applies to the metered impl at line 40-41.

3. AOT register stride hardcodes `/4` instead of `/8`

The entire AOT x86 code generation layer maps RISC-V register addresses to x86 XMM registers using a 4-limb stride (a / 4). With RV64's 8-limb registers, these produce wrong XMM indices:

load_sign_extend/aot.rs:115-116 — a / 4, b / 4
loadstore/aot.rs:150-151 — a / 4, b / 4
branch_lt/aot.rs:40-41 — a / 4, b / 4
auipc/execution.rs:106 — a / 4
branch_eq/execution.rs:131-132 — a / 4, b / 4
jal_lui/execution.rs:114 — a / 4
divrem_w/execution.rs:121-123 — a / 4, b / 4, c / 4
All *_W execution files (base_alu_w, shift_w, mul_w) use same pattern

The helper gpr_to_rv32_register (common/mod.rs:25-31) and xmm_to_gpr (common/mod.rs:61-85) also assume 2 registers per XMM (the RV32 4-limb layout). For RV64, a register like x5 (byte offset 40 = 5×8) would map to 40/4 = 10 → xmm5, instead of the correct 40/8 = 5 → xmm2.

This is a systemic issue — the AOT layer needs a fundamental redesign for 64-bit register values since XMM elements are 32-bit (pinsrd/pextrd). The CI workflow commenting (chore(ci): comment out aot workflows) suggests this is a known deferral.

4. AOT metered code uses `RV32_REGISTER_AS` in RV64 `*_W` chips

The metered path in *_W execution files calls update_adapter_heights_asm(config, RV32_REGISTER_AS):

divrem_w/execution.rs:292-296
base_alu_w/execution.rs:251-256
shift_w/execution.rs:250-255
mul_w/execution.rs:207-211

While RV32_REGISTER_AS == RV64_REGISTER_AS == 1 (functionally identical), update_adapter_heights_asm in common/mod.rs:107 also prints error messages containing "RV32_REGISTER_AS", which is misleading for RV64.

5. CUDA extension references old `Rv32*` types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all of which were renamed to Rv64* in this PR. Every GPU chip construction (lines 39-113) uses the wrong types. Also missing: new *_W chip registrations for GPU backend.

This is clearly a known deferral for this development branch.

Code Quality Issues

6. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ still uses:

cfg(target_os = "zkvm") guard (old) vs the outer's cfg(openvm_intrinsics) (new)
hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
RV32_ALU_OPCODE vs the outer's RV64_ALU_OPCODE

The workspace points to the outer directory. The inner one is a confusing leftover and should be removed.

7. `RV32_CELL_BITS` in `auipc/execution.rs:100`

The AUIPC AOT code computes rd = pc.wrapping_add((c as u32) << RV32_CELL_BITS). Since RV32_CELL_BITS == RV64_CELL_BITS == 8, this is functionally correct but the naming is misleading in an RV64 codebase.

Architecture Observations (Positive)

Decoder crate is well-structured with correct RV64 shift amount handling (6-bit shamt via funct6 for 64-bit shifts, 5-bit for W-suffix with shamt < 32 validation)
Phantom sub-executors properly updated: read_rv64_register, 8-byte hint stream alignment, u64::to_le_bytes() for hint lengths
Toolchain migration is thorough: serializer/deserializer use WORD_SIZE=8, platform memory layout uses 8-byte alignment, read_u64() and read_vec_by_len() correctly handle dword boundaries
*_W adapter/core pattern is well-designed: adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension of 32-bit results back to 64-bit registers
Non-AOT execution paths (base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore) correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>
Test coverage for new W-chips is thorough with proper sign extension edge cases and negative tests for constraint violations

Summary

Priority	Issue	Impact	Status
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot` features	Open
High	#2-3: AOT layer assumes 32-bit registers	Entire AOT code path broken for RV64	Known deferral (CI disabled)
Medium	#5: CUDA extension not ported	GPU proving broken	Known deferral
Low	#4: `RV32_REGISTER_AS` in `*_W` metered paths	Functionally correct, misleading naming	Open
Low	#6: Inner `guest/guest/` directory	Confusing leftover, no runtime impact	Open

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The *_W chip design with split-width adapters is clean. The remaining issues are confined to the AOT/CUDA layers, which are acknowledged as not yet ported.

Recommended before merge:

Fix config.rv32i → config.rv64i in the 5 locations (trivial)
Remove the inner guest/guest/ directory
Track AOT and CUDA port as separate work items

github-actions · 2026-04-20T20:20:50Z

Claude finished @shuklaayush's task in 5m 46s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new/changed files in detail
Review CI workflow changes
Post final review

Overview

Re-reviewing after the latest commit fix(ci): update references from rv32 to riscv (407978a). The PR is at 316 files changed, 15556 insertions, 6834 deletions across 16 commits.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: now uses u64::to_le_bytes() and 8-byte alignment (extension/mod.rs:866-874)
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
CI workflow references — Fixed in latest commit: rv32im → riscv in workflow path filters

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

Rv64ImConfig has field rv64i (line 103 of lib.rs), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	299
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	419

All are behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")], so they don't break default builds but will fail with those features.

Fix this →

2. AOT `load_sign_extend` impl is dead code — wrong const generic

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 40-41. These impls will never be dispatched for any RV64 type.

3. AOT layer assumes 32-bit registers throughout

The AOT x86 code generation layer is fundamentally broken for RV64:

Register stride: a / 4 hardcoded in load_sign_extend/aot.rs:115, loadstore/aot.rs:150, branch_lt/aot.rs:40, and all *_W execution files. With RV64's 8-byte registers, these produce wrong XMM register indices.
XMM helpers: gpr_to_rv32_register and xmm_to_gpr in common/mod.rs:25-85 assume 2 RISC-V registers per XMM (32-bit × 2 = 64-bit XMM), but RV64 registers are 64 bits each — need 1:1 mapping or use different SSE/AVX storage.
*_W execution AOT impls: base_alu_w/execution.rs, shift_w/execution.rs, mul_w/execution.rs, divrem_w/execution.rs — all reference RV32_REGISTER_AS, RV32_IMM_AS, RV32_REGISTER_NUM_LIMBS in their AOT trait implementations.
update_adapter_heights_asm in common/mod.rs:107-117 — error messages say "RV32_REGISTER_AS" and "RV32_MEMORY_AS".

The CI disabling (chore(ci): comment out aot workflows) acknowledges this. The entire AOT layer needs a redesign for 64-bit values since the x86 XMM register mapping is fundamentally 32-bit.

4. CUDA extension references old `Rv32*` types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. All GPU chip files (*/cuda.rs) also use RV32_REGISTER_NUM_LIMBS and RV32_CELL_BITS. Also missing: new *_W chip registrations for GPU backend.

This is clearly a known deferral for this development branch.

Code Quality Issues

5. Serializer `Vec::with_capacity` over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 u64 words (64 bytes). This was the same pattern with Vec<u32> in RV32 (4× over-allocation → now 8× over-allocation). Functionally correct but wastes memory.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / WORD_SIZE) or just Vec::new().

6. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use WORD_SIZE constant for consistency, as done elsewhere in the codebase.

7. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

Uses hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
Uses cfg(target_os = "zkvm") (old) vs cfg(openvm_intrinsics) (new)
Uses RV32_ALU_OPCODE vs RV64_ALU_OPCODE

The workspace points to the outer directory. The inner one should be removed.

8. `cargo doc` disabled in CI

lints.yml comments out the cargo doc step, suggesting docs may not build cleanly. This should be tracked and re-enabled.

Architecture Observations (Positive)

Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32) are correctly implemented. ITypeShamt properly masks to 6 bits (0x3f).
ELF loader correctly validates Class::ELF64, reads instructions in 4-byte chunks (RISC-V instructions are always 32-bit), and constrains the address space to u32.
Adapter/core pattern is well maintained. The *_W adapters (Rv64BaseAluWAdapter, Rv64MultWAdapter) correctly read full 8-byte registers, expose lower 4 bytes to cores, and handle sign extension via XOR-based sign bit extraction.
Address space encoding is correct: rs2_as constrained to boolean maps directly to RV64_IMM_AS=0 / RV64_REGISTER_AS=1.
Sign extension is correctly implemented across all W-suffix chips using the (result as i32 as i64 as u64) chain.
Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.
Phantom sub-executors properly use read_rv64_register(), 8-byte hint stream alignment, and u64::to_le_bytes().
Test coverage for new W-chips is thorough with sign extension edge cases, negative tests for constraint violations, and comprehensive boundary value testing.
Non-AOT, non-CUDA execution paths (base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore) correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.

Summary

Priority	Issue	Impact	Status
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#2-3: AOT layer assumes 32-bit registers	Entire AOT code path broken	Known deferral (CI disabled)
Medium	#4: CUDA extension not ported	GPU proving broken	Known deferral
Low	#5: Serializer over-allocates capacity	Wasted memory	Minor
Low	#6: Hardcoded `8` vs `WORD_SIZE`	Style inconsistency	Minor
Low	#7: Inner `guest/guest/` directory	Confusing leftover	Cleanup
Low	#8: `cargo doc` disabled	Docs may not build	Track

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The *_W chip design with split-width adapters is clean. The decoder crate properly handles all RV64 instruction encoding differences.

Recommended before merge:

Fix config.rv32i → config.rv64i in the 5 locations (trivial, blocks feature-gated compilation)
Remove the inner guest/guest/ directory (confusing leftover)
Track AOT and CUDA port as separate work items with issues

github-actions · 2026-04-20T21:21:23Z

Claude finished @shuklaayush's task in 4m 41s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new/changed files in detail
Review circuit and extension changes
Review toolchain changes
Post final review

Overview

Re-reviewing after the latest commit fix(v2.1-rv64): fix lints and clippy (2304665). The PR is at 316 files changed, 15518 insertions, 6757 deletions across 17 commits.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: now uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment (extension/mod.rs:866-873)
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed in latest commit: changed from_u32(1 << ...) to from_u64(1u64 << ...) (adapters/mod.rs:298)
Algebra extension register reads — Fixed: read_rv32_register → read_rv64_register (algebra/circuit/src/extension/modular.rs)
Fibonacci examples — Fixed: reveal_u32 → reveal_u64

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

Rv64ImConfig has field rv64i (line 101), but the GPU builder still references config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297

Additionally, test files behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")]:

File	Line
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

Fix this →

2. AOT impls reference non-existent `Rv32*` executor types (Compilation error under `aot` feature)

Several AotExecutor and AotMeteredExecutor impls implement traits for types that were renamed to Rv64* but the AOT blocks still use the old names:

File	Line	Has `Rv32...`	Should be `Rv64...`
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

These are dead code — the impls won't be found for the actual Rv64* types. They'll fail to compile under aot feature since the Rv32* structs don't exist.

Fix this →

3. AOT `load_sign_extend` impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 41. These impls will never be dispatched for any RV64 type.

4. AOT register stride hardcodes `/4` instead of `/8`

The AOT x86 code generation layer maps RISC-V register addresses to XMM registers using a 4-limb stride. With RV64's 8-byte registers, all register index calculations produce wrong results:

jal_lui/execution.rs:109 — a / 4
auipc/execution.rs:106 — a / 4
jalr/execution.rs:114 — a / 4, b / 4
load_sign_extend/aot.rs:115-116 — a / 4, b / 4
loadstore/aot.rs:150-151 — a / 4, b / 4
branch_lt/aot.rs:40-41 — a / 4, b / 4
All *_W execution files (base_alu_w, shift_w, mul_w, divrem_w)

The helper gpr_to_rv32_register (common/mod.rs:25-31) and xmm_to_gpr (common/mod.rs:61-85) also assume the RV32 2-registers-per-XMM layout. The entire AOT layer needs a fundamental redesign for 64-bit register values. The CI disabling (chore(ci): comment out aot workflows) acknowledges this.

5. CUDA extension references old `Rv32*` types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. All GPU chip files (*/cuda.rs) also use RV32_REGISTER_NUM_LIMBS and RV32_CELL_BITS. Also missing: new *_W chip registrations. This is clearly a known deferral.

6. `*_W` AOT impls use `RV32_REGISTER_NUM_LIMBS` for generic param

The AOT AotExecutor and AotMeteredExecutor impls for the W-suffix chips use RV32_REGISTER_NUM_LIMBS (4) in their generic parameter:

File	Lines
`base_alu_w/execution.rs`	166, 234
`shift_w/execution.rs`	(same pattern)
`mul_w/execution.rs`	(same pattern)
`divrem_w/execution.rs`	108, 273

These implement BaseAluExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> but the actual types use RV64_REGISTER_NUM_LIMBS. These impls will never be dispatched.

Code Quality Issues

7. Serializer `Vec::with_capacity` over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). This is an 8x over-allocation. Functionally correct but wasteful.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / 8) or just Vec::new().

8. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use a named constant for consistency with the rest of the codebase.

9. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

Uses hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
Uses cfg(target_os = "zkvm") (old) vs cfg(openvm_intrinsics) (new)

The workspace points to the outer directory. The inner one should be removed.

10. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named before merge.

11. 202 remaining `RV32_*` constant references across 50 files

The extensions/riscv/circuit/src/ directory still has 202 occurrences of RV32_REGISTER_AS, RV32_MEMORY_AS, RV32_REGISTER_NUM_LIMBS, RV32_CELL_BITS, and RV32_IMM_AS spread across 50 files. Most are in AOT (*/aot.rs), CUDA (*/cuda.rs), and test code behind feature gates. While RV32_REGISTER_AS == RV64_REGISTER_AS == 1 and RV32_CELL_BITS == RV64_CELL_BITS == 8 (functionally identical), RV32_REGISTER_NUM_LIMBS (4) ≠ RV64_REGISTER_NUM_LIMBS (8) — so any reference to RV32_REGISTER_NUM_LIMBS in non-feature-gated code is a potential logic bug.

12. `auipc/tests.rs:419` uses `RV32_REGISTER_NUM_LIMBS` (value bug)

let a = gen_pointer(&mut rng, RV32_REGISTER_NUM_LIMBS);

This generates pointers with stride 4 instead of 8, which produces incorrect register addresses for RV64 in the CUDA test path.

Architecture Observations (Positive)

Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32) are correctly implemented. ITypeShamt properly handles both 5-bit and 6-bit shift amounts.
ELF loader correctly validates Class::ELF64, uses try_into() for u64→u32 conversions with proper error handling, and constrains addresses to u32.
Non-AOT execution paths are solid — base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore, jalr, jal_lui, auipc all correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).to_le_bytes(), and constrain upper bytes through XOR-based sign bit extraction.
imm_to_bytes and imm_to_u64 (adapters/mod.rs:70,86) correctly handle 12-bit signed immediate sign extension to 64 bits via the 24-bit encoding.
abstract_compose (adapters/mod.rs:292) was correctly fixed to use from_u64(1u64 << ...) to avoid overflow for RV64 register widths.
Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.
Test coverage for the new chips is thorough — comprehensive sign extension edge cases, negative tests for constraint violations, boundary value testing.

Summary

Priority	Issue	Impact	Status
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#2: AOT impls reference `Rv32*Executor` types (8 impls across 4 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#3-4,6: AOT layer assumes 32-bit registers	Entire AOT code path broken	Known deferral (CI disabled)
Medium	#5: CUDA extension not ported	GPU proving broken	Known deferral
Low	#7: Serializer 8x over-allocation	Wasted memory	Minor
Low	#8: Hardcoded `8` vs named constant	Style	Minor
Low	#9: Inner `guest/guest/` directory	Confusing leftover	Cleanup
Low	#10: `TEMP[jpw]` marker	Cleanup	Minor
Low	#11-12: `RV32_*` naming in 50 files	Misleading, few are value bugs	Tracked

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The latest commit successfully fixes the abstract_compose overflow, updates algebra extension register reads, and applies proper clippy/lint fixes.

Recommended before merge:

Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in AOT impls across 4 files (trivial, 8 occurrences)
Remove the inner guest/guest/ directory
Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
Track AOT and CUDA port as separate work items

github-actions · 2026-04-20T22:14:00Z

Claude finished @shuklaayush's task in 5m 27s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new/changed files in detail
Review circuit and extension changes
Review toolchain and decoder changes
Post final review

Overview

Re-reviewing after the latest commit fix(v2.1-rv64): fix default range checker sizes (e12e402). The PR is at 318 files changed, 15533 insertions, 6872 deletions across 20 commits.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: now uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment (extension/mod.rs:866-873)
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: changed from_u32(1 << ...) to from_u64(1u64 << ...) (adapters/mod.rs:298)
CI workflow references — Fixed: rv32im → riscv in workflow path filters
Range checker sizes — Fixed in latest commit: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS for carry bound formula

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

All are behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")], so they don't break default builds but will fail with those features enabled.

Fix this →

2. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot` feature)

Several AotExecutor and AotMeteredExecutor impls are written for Rv32* struct names, but the actual struct types were renamed to Rv64*. These are dead code — the impls apply to types that don't exist:

File	Lines	Has `Rv32...`	Should be `Rv64...`
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor<A>`	`Rv64JalLuiExecutor<A>`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor<A>`	`Rv64AuipcExecutor<A>`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor<A>`	`Rv64JalrExecutor<A>`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

3. AOT `load_sign_extend` impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 40-41. These impls will never be dispatched for any RV64 type. Unlike the *_W chips where RV32_REGISTER_NUM_LIMBS == RV64_WORD_NUM_LIMBS == 4, here the actual executor uses 8 limbs.

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot`)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but only imports MulWOpcode from openvm_riscv_transpiler (line 14). MulOpcode is not re-exported through crate::common::*. This will fail to compile under the aot feature.

5. CUDA extension references old `Rv32*` types (Compilation error under `cuda`)

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. Also missing: new *_W chip registrations for GPU backend. This is clearly a known deferral (CI disabled).

Code Quality Issues

6. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8× over-allocation.

7. Hardcoded `8` instead of named constant in `io/mod.rs`

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use a named constant (e.g., WORD_SIZE) for consistency with the rest of the codebase.

8. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, RV32_ALU_OPCODE, old cfg(target_os = "zkvm") guard). The workspace points to the outer directory. The inner one should be removed.

9. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named before merge.

10. `RV32_` constant naming in `_W` and execution AOT code

The *_W execution files (base_alu_w, shift_w, mul_w, divrem_w) use RV32_REGISTER_AS and RV32_IMM_AS in their AOT trait implementations. While these constants are numerically equal to their RV64_* counterparts (1 and 0 respectively), it's misleading. Similarly, base_alu_w/execution.rs:166,234 uses RV32_REGISTER_NUM_LIMBS (4) where RV64_WORD_NUM_LIMBS (also 4) would be the semantically correct constant.

Architecture Observations (Positive)

Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32) are correctly implemented. ITypeShamt properly handles both 5-bit and 6-bit shift amounts.
Range checker sizes (latest commit e12e402) are now correctly computed: [1 << RV64_CELL_BITS, 2 * RV64_REGISTER_NUM_LIMBS * (1 << RV64_CELL_BITS)] = [256, 4096]. The carry bound formula 2N × 2^B for N=8, B=8 is a standard conservative upper bound for N-limb multiplication intermediate values.
ELF loader correctly validates Class::ELF64, reads instructions in 4-byte chunks (RISC-V instructions are always 32-bit even for RV64), and constrains the address space to u32.
Non-AOT execution paths are solid — base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore, jalr, jal_lui, auipc all correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).to_le_bytes(), and constrain upper bytes through XOR-based sign bit extraction.
Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling. imm_to_u64 and imm_to_bytes properly handle 12-bit signed immediate sign-extension to 64 bits via the 24-bit encoding.
Phantom sub-executors properly use read_rv64_register(), 8-byte hint stream alignment, and u64::to_le_bytes().
Test coverage for new chips is thorough — comprehensive sign extension edge cases, negative tests for constraint violations, and boundary value testing. New transpiler tests with hand-written RV64 assembly (rv64im-stress.S, rv64im-intrin.S) cover all RV64IM instruction categories.

Summary

Priority	Issue	Impact	Status
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#2: AOT impls reference `Rv32*Executor` types (8 impls across 4 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#3: `load_sign_extend/aot.rs` wrong const generic	Dead AOT impl	Part of AOT port
Medium	#4: Missing `MulOpcode` import in `mul_w` AOT	Compilation failure under `aot`	Trivial fix
Medium	#5: CUDA extension not ported	GPU proving broken	Known deferral
Low	#6: Serializer 8× over-allocation	Wasted memory	Minor
Low	#7: Hardcoded `8` vs named constant	Style	Minor
Low	#8: Inner `guest/guest/` directory	Confusing leftover	Cleanup
Low	#9: `TEMP[jpw]` marker	Cleanup	Minor
Low	#10: `RV32_*` naming in AOT paths	Misleading, functionally correct	Tracked

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The latest commit correctly fixes the range checker sizes for RV64 multiplication. The *_W chip design with split-width adapters is clean.

Recommended before merge:

Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in AOT impls across 4 files (trivial, 8 occurrences)
Remove the inner guest/guest/ directory
Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
Track AOT and CUDA port as separate work items

github-actions · 2026-04-20T22:28:02Z

Claude finished @shuklaayush's task in 5m 30s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new/changed files in detail
Review circuit and extension changes
Review toolchain and decoder changes
Post final review

Overview

Re-reviewing after the latest commit feat(v2.1-rv64): update guest toolchain target to riscv64im-unknown-none-elf (2d0588c). The PR is at 319 files changed, 15544 insertions, 6882 deletions across 20 commits.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...) (adapters/mod.rs:298)
CI workflow references — Fixed: rv32im → riscv in workflow path filters
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS

New in latest commit (2d0588c):

Guest toolchain target updated from riscv32im-risc0-zkvm-elf → riscv64im-unknown-none-elf
build-std narrowed from alloc,core,proc_macro,panic_abort,std to alloc,core,panic_abort (std removed — consistent with no_std guest programs)
--cfg openvm_intrinsics added to rustc flags, replacing the old cfg(target_os = "zkvm") guard
cfg(not(any(target_arch = "riscv64"))) fixed to cfg(not(target_arch = "riscv64")) in guest Cargo.toml
std tests properly #[ignore]d with TODO(rv64-std) comments

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

All behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")] — won't break default builds but will fail with those features.

Fix this →

2. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot` feature)

The actual struct types were renamed to Rv64*, but AOT impl blocks still name Rv32*:

File	Lines	Has	Should be
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor<A>`	`Rv64JalLuiExecutor<A>`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor<A>`	`Rv64AuipcExecutor<A>`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor<A>`	`Rv64JalrExecutor<A>`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

These are dead code — the impls implement traits for types that don't exist. Will fail to compile under aot feature.

Fix this →

3. AOT `load_sign_extend` impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 41. These impls will never be dispatched.

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot`)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but only imports MulWOpcode from openvm_riscv_transpiler (line 14). MulOpcode is not imported anywhere in the file or re-exported through crate::common::*. This will fail to compile under the aot feature.

5. AOT layer assumes 32-bit registers throughout (systemic)

The AOT x86 code generation layer is fundamentally broken for RV64:

Register stride: a / 4 hardcoded in jal_lui/execution.rs:109, auipc/execution.rs:106, jalr/execution.rs:114, load_sign_extend/aot.rs:115, loadstore/aot.rs:150, branch_lt/aot.rs:40, and all *_W execution files. With RV64's 8-byte registers, these produce wrong XMM register indices.
XMM helpers: gpr_to_rv32_register and xmm_to_gpr in common/mod.rs:25-85 assume 2 RISC-V registers per XMM (32-bit × 2 = 64-bit XMM), but RV64 registers are 64 bits each.
*_W AOT impls use RV32_REGISTER_NUM_LIMBS (4) in const generics where the actual executor types use RV64_REGISTER_NUM_LIMBS (8) — making these impls dead code.

The CI disabling (chore(ci): comment out aot workflows) acknowledges this. The entire AOT layer needs a redesign for 64-bit register values.

6. CUDA extension references old `Rv32*` types (Compilation error under `cuda`)

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. Also missing: new *_W chip registrations for GPU backend. Clearly a known deferral.

New Observations on Latest Commit (`2d0588c`)

7. `riscv64im-unknown-none-elf` target requires nightly `-Z build-std`

The target riscv64im-unknown-none-elf is not a built-in Rust target — it requires -Z build-std (nightly-only). This is correctly configured (lines 268-273 of build/src/lib.rs) and the nightly toolchain is already required for guest compilation. The ISA string riscv64im correctly encodes base integer (I) + multiplication/division (M).

Note: The build-std removing std and proc_macro means guest programs can only use no_std. The #[ignore] annotations on test_rv64im_std and test_hashmap with TODO(rv64-std) comments correctly acknowledge this limitation.

8. `openvm_intrinsics` cfg flag is well-integrated

The migration from cfg(target_os = "zkvm") to cfg(openvm_intrinsics) is clean:

Set in encode_rust_flags (build/src/lib.rs:310-311)
check-cfg lint configured in both openvm/Cargo.toml and guest/Cargo.toml
Consistently used across openvm/src/lib.rs, io/mod.rs, io/read.rs, memory.rs
No stale cfg(target_os = "zkvm") references remain in the outer guest or toolchain crates

Code Quality Issues

9. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

Uses cfg(target_os = "zkvm") (old) vs the outer's cfg(openvm_intrinsics) (new)
Uses hint_buffer_u32! / RV32_ALU_OPCODE (RV32) vs the outer's hint_buffer_u64! / RV64_ALU_OPCODE (RV64)
Lacks the check-cfg lint configuration for openvm_intrinsics

The workspace points to the outer directory. The inner one should be removed to avoid confusion.

10. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. An 8× over-allocation. Functionally correct.

11. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

WORD_SIZE is defined as 8 in platform/src/lib.rs:23. Should use the constant for consistency.

12. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker.

Architecture Observations (Positive)

Toolchain migration is thorough: The openvm_intrinsics cfg flag is a cleaner approach than target_os = "zkvm", decoupling intrinsic availability from the target triple. The riscv64im-unknown-none-elf target correctly uses -Z build-std without std.
Entry point (lib.rs:122-137) correctly uses ld sp, 0(sp) (load doubleword) and STACK_TOP: u64. The fault handler uses sd x0, 1(x0) (store doubleword to misaligned address).
Host hint stream (host.rs) properly uses u64::to_le_bytes() for length encoding, consistent with guest-side read_u64().
ELF loader correctly validates Class::ELF64, uses try_into() for all u64→u32 address conversions with error handling, and keeps 4-byte instruction chunks (RISC-V instructions are always 32-bit even in RV64).
Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).
Non-AOT execution paths are solid — base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore, jalr, jal_lui, auipc all correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).to_le_bytes().
Test coverage is thorough — hand-written RV64 assembly (rv64im-stress.S, rv64im-intrin.S), comprehensive sign extension edge cases, negative tests for constraint violations.
build-std change correctly removes std and proc_macro, acknowledging no_std-only guest programs with properly #[ignore]d std tests and clear TODO comments.

Summary

Priority	Issue	Impact	Status
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#2: AOT impls reference `Rv32*Executor` types (8 impls across 4 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#4: Missing `MulOpcode` import in `mul_w` AOT	Compilation failure under `aot`	Trivial fix
High	#3,5: AOT layer assumes 32-bit registers (systemic)	Entire AOT code path broken	Known deferral (CI disabled)
Medium	#6: CUDA extension not ported	GPU proving broken	Known deferral
Low	#9: Inner `guest/guest/` directory	Confusing stale RV32 leftover	Cleanup
Low	#10: Serializer 8× over-allocation	Wasted memory	Minor
Low	#11: Hardcoded `8` vs `WORD_SIZE`	Style inconsistency	Minor
Low	#12: `TEMP[jpw]` marker	Cleanup	Minor

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The latest commit cleanly migrates the guest toolchain to riscv64im-unknown-none-elf with proper no_std build configuration and the new openvm_intrinsics cfg flag. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent.

Recommended before merge:

Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in AOT impls across 4 files (trivial, 8 occurrences)
Add missing MulOpcode import in mul_w/execution.rs
Remove the inner guest/guest/ directory
Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
Track AOT and CUDA port as separate work items

github-actions · 2026-04-21T15:36:08Z

Claude finished @876pol's task in 6m 1s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new SHA2 changes
Review circuit and extension changes (deferral, algebra, bigint)
Review toolchain and decoder changes
Post final review

Overview

Re-reviewing after the latest commit feat: rv64 sha2 (#2721) (42b8678). The PR is at 329 files changed, 15747 insertions, 7024 deletions across 20 commits.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

NEW: Critical Bugs Found

1. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns, trace records, and range checks all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but RV64 registers are 8 bytes wide. This means the memory bridge reads/writes only 4 bytes from each register, which doesn't match the actual 8-byte register width.

AIR columns (output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276) read local.rd_val (4 elements) from the register address space, but an RV64 register occupies 8 bytes. The AIR constraint system is inconsistent with the actual register width.

Address range checks (output/air.rs:237-244, output/trace.rs:337-346, call/air.rs:306-313, call/trace.rs:177-180, 407-412) use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) for the address bound, but RV64 heap pointers can be up to 64 bits. bytes_to_f(&local.rd_val) composes only 4 bytes into a pointer, truncating any address above 2^32.

Impact: The memory bridge interactions won't match what the system memory chip expects for 8-byte register reads. The AIR constraints are unsound — they allow reading only half a register.

Files affected: output/air.rs, output/trace.rs, call/air.rs, call/trace.rs (all under extensions/deferral/circuit/src/)

2. Deferral transpiler uses wrong register stride (Address bug)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register addresses are computed as register_index × limbs_per_register. With RV64, each register occupies 8 bytes, so the stride should be RV64_REGISTER_NUM_LIMBS (8). Using 4 causes register addresses to overlap (e.g., register x2 at offset 8 would be addressed as offset 16, but with stride 4 it's addressed as offset 8 which overlaps with x1).

3. Algebra extension uses 4-byte register writes and wrong transpiler stride

Circuit (algebra/circuit/src/modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register, leaving upper bytes unwritten.

Transpiler (algebra/transpiler/src/lib.rs:102-103, 126, 204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),

Same wrong register address stride as the deferral transpiler.

Tests (algebra/circuit/src/fp2_chip/tests.rs:191-209, modular_chip/tests.rs:215-216):
Use RV32_REGISTER_NUM_LIMBS for pointer generation and register writes — all produce incorrect 4-byte-stride addresses.

4. Bigint transpiler uses wrong register stride

extensions/bigint/transpiler/src/lib.rs:146-147:

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

Previously Reported Bugs (Still Open)

5. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

Fix this →

6. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot`)

8 impls across 4 files use old Rv32* struct names:

File	Lines	Has	Should be
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

7. AOT `load_sign_extend` impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21, 40-41 implements for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs) but the actual alias uses RV64_REGISTER_NUM_LIMBS (8).

8. AOT layer assumes 32-bit registers throughout (systemic)

Register stride /4, XMM helpers, *_W AOT impls all use 32-bit assumptions. CI is disabled (chore(ci): comment out aot workflows) — known deferral.

9. CUDA extension references old `Rv32*` types

extension/cuda.rs:11-18 imports non-existent Rv32* types. Known deferral.

New SHA2 Observations

The SHA2 RV64 migration is well-done overall. Constants correctly updated to RV64_*:

RV64_REGISTER_AS, RV64_MEMORY_AS, RV64_REGISTER_NUM_LIMBS, RV64_CELL_BITS all correct
Pointer handling properly constrains upper 4 bytes (limbs 4-7) to zero for 32-bit addressing
SHA2_READ_SIZE == SHA2_WRITE_SIZE == 8 correctly handles 8-byte memory operations

Minor semantic issue in sha2_chips/main_chip/air.rs:345,350,355: The write loop uses STATE_READS and SHA2_READ_SIZE instead of STATE_WRITES and SHA2_WRITE_SIZE. Currently harmless because both pairs are equal (STATE_BYTES / 8), but semantically incorrect and would break if read/write sizes ever diverged.

Code Quality Issues

10. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

11. Serializer `Vec::with_capacity` 8× over-allocation

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes but with_capacity takes element count. Should be / 8.

12. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare the same package name. Inner directory uses stale RV32 code (hint_buffer_u32!, old cfg guards).

13. Deferral config field naming

extensions/deferral/circuit/src/extension/mod.rs:228-232 — Config fields named rv32i, rv32m but types are Rv64I, Rv64M, Rv64Io. GPU builder at extension/cuda.rs:136,141,146 references config.rv32i, config.rv32m — functionally correct but misleading.

Architecture Observations (Positive)

SHA2 RV64 port is clean — pointer constraints properly zero-extend upper limbs, transpiler correctly uses Rv64Sha2Opcode, and execution reads 8-byte registers.
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation.
Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.

Summary

Priority	Issue	Impact
Critical	#1: Deferral AIR/trace uses 4-byte register arrays	Soundness: memory bridge mismatches, AIR constraints wrong
Critical	#2: Deferral transpiler wrong register stride	Wrong register addresses in transpiled instructions
Critical	#3: Algebra circuit 4-byte register writes + wrong transpiler stride	Partial register writes + wrong addresses
Critical	#4: Bigint transpiler wrong register stride	Wrong register addresses
High	#5: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`
High	#6: AOT `Rv32Executor` → `Rv64Executor` (4 files)	Dead code + compilation failure
Medium	#7-9: AOT/CUDA systemic issues	Known deferral (CI disabled)
Low	#10-13: Code quality	Cleanup items

The core riscv extension (non-AOT, non-CUDA) is solid, but the deferral, algebra, and bigint extensions have critical RV32_REGISTER_NUM_LIMBS → RV64_REGISTER_NUM_LIMBS migration gaps that affect both AIR soundness and transpiler correctness.

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, range checks, and transpiler (affects ~50 locations across 6 files)
Fix algebra extension — update is_eq.rs register write size, transpiler stride, and test constants
Fix bigint transpiler — update register stride
Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in AOT impls (trivial)
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT and CUDA port as separate work items

github-actions · 2026-04-22T16:20:50Z

Claude finished @876pol's task in 6m 31s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new riscv-adapters changes (latest commit)
Review SHA2 and toolchain changes
Review deferral, algebra, bigint extension changes
Post final review

Overview

Re-reviewing after the latest commit feat: rv64 riscv-adapters (#2722) (6761341). The PR is at 21 commits, 100 changed files, +15988/−7102 lines.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

New in latest commit (6761341):

rv32-adapters renamed to riscv-adapters with full RV64 port
New helpers.rs with pad_reg_val, compose_ptr, tracing_read_reg_ptr utilities
All adapters (vec_heap, vec_heap_branch, eq_mod, vec_to_flat) properly use RV64_REGISTER_NUM_LIMBS and RV64_WORD_NUM_LIMBS
Updated test utilities with rv64_write_heap_default, generate_rv64_is_type_immediate

New: riscv-adapters Changes (Latest Commit) — Well Done

The riscv-adapters port is solid. Key design decisions are correct:

pad_reg_val (helpers.rs:12-21): Takes [Var; RV64_WORD_NUM_LIMBS] (4 bytes), returns [Expr; RV64_REGISTER_NUM_LIMBS] (8 bytes) with zero-padded upper bytes — enforces 32-bit pointer constraint at proving time via the memory bus interaction.
compose_ptr (helpers.rs:28-36): Correctly composes 4 materialized bytes into a field element for use as a memory address.
tracing_read_reg_ptr (helpers.rs:42-55): Reads full 8-byte register, debug-asserts upper 4 bytes are zero, returns lower 4 bytes as u32 — correct.
Range checks in all adapters correctly use RV64_CELL_BITS * RV64_WORD_NUM_LIMBS (32 bits) for pointer bound constraints.
Rv64IsEqualModAdapterCols (eq_mod.rs:64): Correctly uses MemoryWriteAuxCols<T, RV64_REGISTER_NUM_LIMBS> for the full 8-byte register write, while rs_val is [T; RV64_WORD_NUM_LIMBS] (4 limbs) — this is the right split.
Test utilities properly use RV64_REGISTER_NUM_LIMBS for write_heap_default and write_heap_pointer_default calls.

Zero RV32_* references in the entire riscv-adapters directory.

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

Fix this →

2. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot`)

8 impl blocks across 4 files use Rv32* struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

3. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but DEFAULT_BLOCK_SIZE = 8 — meaning the memory bus expects 8-byte register interactions.

AIR columns (output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276) read local.rd_val (4 elements) from the register address space. The memory system expects 8-byte register reads (DEFAULT_BLOCK_SIZE = 8). This makes the AIR constraints unsound.

Address range checks (output/air.rs:237-244, call/air.rs:306-313) use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) — this happens to be correct since pointers are 32-bit, but should use RV64_* constants for clarity.

Impact: Memory bus interaction width mismatch will cause proof generation failure or unsound constraints.

4. Deferral transpiler uses wrong register stride

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be at address 20 instead of the correct 40 for RV64.

5. Algebra circuit + transpiler: wrong register width and stride

Circuit (algebra/circuit/src/modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register.

Transpiler (algebra/transpiler/src/lib.rs:102-103, 126, 204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8

Modular chip (algebra/circuit/src/modular_chip/mod.rs:42, 54, 69): Core AIR types parameterized with RV32_REGISTER_NUM_LIMBS (4) instead of RV64_REGISTER_NUM_LIMBS (8).

6. Bigint transpiler uses wrong register stride

extensions/bigint/transpiler/src/lib.rs:146-147:

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

7. AOT layer assumes 32-bit registers throughout (systemic)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers, and RV32_REGISTER_NUM_LIMBS const generics. This is a known deferral — CI is disabled via chore(ci): comment out aot workflows.

8. CUDA extension references old `Rv32*` types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, etc. — all renamed. Also SHA2 CUDA files (sha2_main.cu:56-74, columns.cuh:27-29) still use RV32_REGISTER_NUM_LIMBS. Known deferral.

Code Quality Issues

9. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

10. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.

11. Serializer `Vec::with_capacity` over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes element count. Over-allocates by 8×.

12. Hardcoded `8` in `io/mod.rs` instead of named constant

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use WORD_SIZE constant.

13. Keccak256 extension not yet ported

keccak256/circuit/src/keccakf_op/columns.rs:22 and xorin/columns.rs:25-29 use RV32_REGISTER_NUM_LIMBS (4) for pointer limb arrays. Not part of this PR but will need porting before keccak256 works with RV64.

Architecture Observations (Positive)

riscv-adapters port is clean and well-structured. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers provide a clean abstraction for the 4-byte materialized pointer + 8-byte register value pattern.
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).
Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.
Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.
ELF loader correctly validates Class::ELF64 and constrains addresses to u32.

Summary

Priority	Issue	Impact	Status
Critical	#3: Deferral AIR uses 4-byte register arrays	Memory bus mismatch, unsound constraints	Open
Critical	#4: Deferral transpiler wrong register stride	Wrong register addresses	Open
Critical	#5: Algebra circuit 4-byte writes + wrong transpiler stride	Partial register writes + wrong addresses	Open
Critical	#6: Bigint transpiler wrong register stride	Wrong register addresses	Open
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#2: AOT `Rv32Executor` → `Rv64Executor` (4 files)	Dead code + compilation failure under `aot`	Trivial rename
Medium	#7-8: AOT/CUDA systemic issues	Known deferral (CI disabled)	Tracked
Low	#9-12: Code quality	Cleanup items	Open

The core riscv extension (non-AOT, non-CUDA) and the new riscv-adapters are solid. The adapter port in the latest commit is well-designed with correct constant usage and a clean helper API.

The deferral, algebra, and bigint extensions still have critical RV32_REGISTER_NUM_LIMBS → RV64_REGISTER_NUM_LIMBS migration gaps that affect both AIR soundness and transpiler correctness.

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, range checks, and transpiler
Fix algebra extension — update is_eq.rs register write size, modular_chip/mod.rs type parameters, transpiler stride, and test constants
Fix bigint transpiler — update register stride
Fix config.rv32i → config.rv64i in 5 locations
Fix Rv32*Executor → Rv64*Executor in AOT impls
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA, and keccak256 port as separate work items

github-actions · 2026-04-22T17:59:04Z

Claude finished @876pol's task in 6m 6s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new bigint changes (latest commit)
Review riscv-adapters changes
Review deferral and algebra extension changes in detail
Post final review

Overview

Re-reviewing after the latest commits feat: rv64 bigint (#2732) (4c64459) and feat: rv64 riscv-adapters (#2722) (6761341). The PR is at 22 commits, 100 changed files, +16518/−7535 lines.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

New in latest commits:

rv32-adapters renamed to riscv-adapters with full RV64 port — clean migration with new pad_reg_val, compose_ptr, tracing_read_reg_ptr helpers
Bigint extension updated to use RV64 types and adapters — transpiler correctly uses RV64_REGISTER_NUM_LIMBS

NEW: riscv-adapters Changes — Well Done

The riscv-adapters port is solid. Zero RV32_* references remain. Key design decisions are correct:

pad_reg_val (helpers.rs:12): Pads 4 materialized bytes to 8-byte register value with zero upper bytes — correct for 32-bit pointer constraint
compose_ptr (helpers.rs:27): Correctly composes 4 bytes into a 32-bit field element pointer
tracing_read_reg_ptr (helpers.rs:43): Reads full 8-byte register, debug-asserts upper 4 bytes are zero — correct
All adapters use RV64_REGISTER_NUM_LIMBS for register reads/writes via the memory bridge
Range checks correctly use RV64_CELL_BITS * RV64_WORD_NUM_LIMBS (32 bits) for pointer bounds

NEW: Bigint Extension — Mostly Good, AOT Dead Code

The bigint transpiler is correctly updated — uses RV64_REGISTER_NUM_LIMBS for register stride and Rv64*256Opcode types (bigint/transpiler/src/lib.rs:4,146-147).

The bigint circuit core types are correctly updated — Rv64BaseAlu256Executor, Rv64LessThan256Executor, etc. all use RV64_CELL_BITS and Rv64VecHeapAdapter* (bigint/circuit/src/lib.rs:108-179).

Bigint Bug: AOT impls reference non-existent `Rv32*256Executor` types

12 AOT trait impls across 6 files implement for Rv32* struct names, but the actual types were renamed to Rv64*:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

These are dead code — the impls implement traits for types that don't exist. Will fail to compile under aot feature.

Fix this →

Bigint CUDA/Test: `RV32_CELL_BITS` and `Rv32*256Opcode` references

bigint/circuit/src/cuda/mod.rs:15: Imports RV32_CELL_BITS (should be RV64_CELL_BITS; same value but wrong name — used in 20+ locations in the file)
bigint/circuit/src/tests.rs:655,716,782,847,905,971: Uses Rv32*256Opcode::CLASS_OFFSET — should be Rv64*256Opcode
bigint/circuit/src/tests.rs:629,690,753,819,879,945: Uses RV32_CELL_BITS — should be RV64_CELL_BITS
bigint/circuit/src/tests.rs:922,988: Uses Rv32VecHeapBranchAdapterExecutor — should be Rv64VecHeapBranchAdapterExecutor
bigint/circuit/src/extension/cuda.rs:55-101: References Rv32*256Air types — should be Rv64*256Air

Bugs (Still Open from Previous Reviews)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

Rv64ImConfig has field rv64i (line 101 of lib.rs), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

Fix this →

2. RISCV AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot`)

8 impl blocks across 4 files:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

3. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns, trace records, and memory bridge interactions all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays. With DEFAULT_BLOCK_SIZE = 8, the memory bus expects 8-byte register interactions. The memory_bridge.read() call passes local.rd_val (4 elements), creating a 4-byte read interaction that doesn't match the system's 8-byte register width.

AIR columns (deferral/circuit/src/output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276): Pass local.rd_val (4 elements) to memory_bridge.read(), creating a 4-byte memory interaction.

Address range checks (output/air.rs:237-244): Use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) and local.rd_val[RV32_REGISTER_NUM_LIMBS - 1] — functionally correct for 32-bit pointer bounds but semantically wrong.

bytes_to_f (utils.rs:64-68): Uses RV32_CELL_BITS for composition shift — functionally correct (same value) but should reference RV64_CELL_BITS.

Impact: The memory bridge interactions create 4-byte reads for register values, but the system memory chip expects 8-byte register reads (matching DEFAULT_BLOCK_SIZE = 8). This will cause a memory bus argument mismatch during proof generation.

Note: The pointer values are correctly 32-bit (F_NUM_BYTES = 4), so the pointer arithmetic is fine. The issue is that the register read width must be 8 bytes for RV64, with only the lower 4 bytes used as the pointer value and upper 4 bytes constrained to zero.

4. Deferral transpiler uses wrong register stride

deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64.

5. Algebra circuit + transpiler: wrong register width and stride

Execution code (algebra/circuit/src/execution.rs:395,408):

u32::from_le_bytes(exec_state.vm_read(RV32_REGISTER_AS, addr as u32))

Reads 4 bytes from an 8-byte register — truncates register value. Since registers hold pointers (which are 32-bit), this produces correct values if upper bytes are zero, but the read width doesn't match the register width.

IsEq execution (algebra/circuit/src/modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register.

Transpiler (algebra/transpiler/src/lib.rs:7,102-103,126,204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8

6. Deferral config/builder naming

deferral/circuit/src/extension/mod.rs:224-241: Config struct is Rv32DeferralConfig with fields rv32i: Rv64I, rv32m: Rv64M. Builder is Rv32DeferralCpuBuilder. These should be renamed to Rv64Deferral* or at least generic names.

7. AOT layer assumes 32-bit registers throughout (systemic — known deferral)

The AOT x86 code generation layer is fundamentally broken for RV64: register stride /4, XMM helpers, *_W AOT impls all use 32-bit assumptions. CI is disabled (chore(ci): comment out aot workflows).

8. CUDA extension references old `Rv32*` types (known deferral)

Both riscv/circuit/src/extension/cuda.rs and bigint/circuit/src/extension/cuda.rs reference old Rv32* types.

Code Quality Issues

9. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

10. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.

11. Serializer `Vec::with_capacity` over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements — 8× over-allocation.

Architecture Observations (Positive)

riscv-adapters port is clean and well-structured. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers provide a clean abstraction for the 4-byte pointer + 8-byte register value pattern.
Bigint transpiler correctly uses RV64_REGISTER_NUM_LIMBS for register stride.
Bigint circuit core types correctly use RV64_CELL_BITS and Rv64VecHeapAdapter*.
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation.
Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.

Summary

Priority	Issue	Impact	Status
Critical	#3: Deferral AIR uses 4-byte register arrays	Memory bus mismatch, unsound constraints	Open
Critical	#4: Deferral transpiler wrong register stride	Wrong register addresses	Open
Critical	#5: Algebra circuit 4-byte writes + wrong transpiler stride	Partial register writes + wrong addresses	Open
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#2: RISCV AOT `Rv32Executor` → `Rv64Executor` (4 files)	Dead code + compilation failure	Trivial rename
High	NEW: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files)	Dead code + compilation failure	Trivial rename
Medium	#6: Deferral config naming `Rv32Deferral*`	Misleading naming	Cosmetic
Medium	#7-8: AOT/CUDA systemic issues	Known deferral (CI disabled)	Tracked
Low	#9-11: Code quality	Cleanup items	Open

The core riscv extension (non-AOT, non-CUDA) and the new riscv-adapters are solid. The bigint transpiler and core circuit types are correctly updated.

The deferral and algebra extensions still have critical RV32_REGISTER_NUM_LIMBS → RV64_REGISTER_NUM_LIMBS migration gaps that affect both AIR soundness and transpiler correctness. These are the same issues from the previous review — they have not been addressed yet.

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, memory bridge interactions, and transpiler stride (~20 locations across 5 files)
Fix algebra extension — update is_eq.rs register write size, execution.rs register read size, transpiler stride (~15 locations across 3 files)
Fix config.rv32i → config.rv64i in riscv circuit (5 locations)
Fix Rv32*Executor → Rv64*Executor in riscv AOT impls (4 files, 8 occurrences)
Fix Rv32*256Executor → Rv64*256Executor in bigint AOT impls (6 files, 12 occurrences)
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA, and keccak256 port as separate work items

github-actions · 2026-04-22T17:59:25Z

Claude finished @876pol's task in 7m 12s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes in detail (riscv-adapters, bigint, reduce memory columns)
Review deferral, algebra extension changes
Post final review

Overview

Re-reviewing after the latest commits. The PR is at 23 commits, 100 changed files, +16066/−7176 lines. Three new commits since the last review:

6761341 feat: rv64 riscv-adapters (#2722)
4c64459 feat: rv64 bigint (#2732)
7b24bc6 fix: rv64 reduce memory address columns (#2733)

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf
Bigint transpiler register stride — Fixed: now correctly uses RV64_REGISTER_NUM_LIMBS (confirmed zero RV32_* references in bigint transpiler)

New Commit Assessment

`6761341` riscv-adapters — ✅ Well Done

The riscv-adapters port is solid. Zero RV32_* references in non-CUDA Rust code. Key design decisions are correct:

pad_reg_val (helpers.rs): Takes [Var; RV64_WORD_NUM_LIMBS] (4 bytes), returns [Expr; RV64_REGISTER_NUM_LIMBS] (8 bytes) with zero-padded upper bytes — correct.
compose_ptr and tracing_read_reg_ptr: Clean abstractions for the 4-byte pointer + 8-byte register pattern.
All adapters (vec_heap, vec_heap_branch, eq_mod, vec_to_flat) properly use RV64_REGISTER_NUM_LIMBS and RV64_WORD_NUM_LIMBS.
Test utilities properly migrated.

`4c64459` rv64 bigint — ⚠️ Contains AOT/CUDA bugs (see below)

The non-AOT, non-CUDA parts are well-structured:

Common utilities in common.rs properly use RV64_* constants
INT256 operations correctly defined with u64 limbs
Transpiler correctly uses RV64_REGISTER_NUM_LIMBS throughout
Type aliases properly use Rv64VecHeapAdapterAir/Executor

But: AOT trait impls and CUDA code are broken — see bugs #7 and #8 below.

`7b24bc6` reduce memory address columns — ✅ Sound optimization

The reduction from RV64_REGISTER_NUM_LIMBS (8) to RV64_WORD_NUM_LIMBS (4) for pointer columns is sound:

Memory bus enforcement: All files construct full 8-limb arrays with explicit zero-padding for memory bridge interactions (e.g., loadstore.rs:163-169)
Trace generation safety: rs1_val changed from u64 to u32 with an assertion that upper 32 bits are zero (loadstore.rs:386-390)
Removed zero-constraint loops are safe — the constraint is now enforced structurally at the memory bus level rather than via explicit AIR assertions
Hintstore and SHA2 follow the same pattern correctly
Deleted tests (test_hint_buffer_rem_words_upper_limbs_zero, test_hint_buffer_mem_ptr_upper_limbs_zero) are no longer necessary since constraints are implicit

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

Fix this →

2. AOT impls reference non-existent `Rv32*Executor` types in riscv circuit (Compilation error under `aot`)

8 impl blocks across 4 files use Rv32* struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

3. AOT `load_sign_extend` impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21, 40-41 implements for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs) but the actual alias uses RV64_REGISTER_NUM_LIMBS (8). Dead code that will fail to compile under aot.

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot`)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported.

5. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

AIR columns and trace records in output/air.rs:59-60 and call/air.rs:230-231 use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but DEFAULT_BLOCK_SIZE = 8. The memory bridge reads/writes only 4 bytes from each register, creating an AIR constraint mismatch.

6. Deferral transpiler uses wrong register stride

deferral/transpiler/src/lib.rs:85-86, 95-96 computes register addresses as RV32_REGISTER_NUM_LIMBS * dec_insn.rd (stride 4 instead of 8). Register x5 would be at address 20 instead of the correct 40.

7. NEW: Bigint circuit AOT impls reference non-existent `Rv32*` types (Compilation error under `aot`)

12 impl blocks across 6 files implement traits for Rv32*256Executor types that don't exist (renamed to Rv64*256Executor):

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

8. NEW: Bigint CUDA tests reference non-existent types (Compilation error under `cuda`)

bigint/circuit/src/tests.rs (behind #[cfg(feature = "cuda")]):

Lines 655, 716, 782, 847, 905, 971: Reference Rv32*256Opcode::CLASS_OFFSET — should be Rv64*256Opcode::CLASS_OFFSET
Lines 922, 988: Reference Rv32VecHeapBranchAdapterExecutor — should be Rv64VecHeapBranchAdapterExecutor

Also, bigint/circuit/src/cuda/mod.rs imports from the old openvm_rv32_adapters package which no longer exists, and uses Rv32VecHeapAdapterCols/Rv32VecHeapAdapterRecord types.

9. Algebra circuit + transpiler: wrong register width and stride

algebra/circuit/src/modular_chip/is_eq.rs:726: [0u8; RV32_REGISTER_NUM_LIMBS] — writes only 4 bytes to an 8-byte register
algebra/transpiler/src/lib.rs:102-103, 126, 204-205: RV32_REGISTER_NUM_LIMBS * dec_insn.rd — wrong stride

10. CUDA extension references old `Rv32*` types (systemic)

riscv/circuit/src/extension/cuda.rs, bigint/circuit/src/extension/cuda.rs, deferral/circuit/src/extension/cuda.rs, and CUDA .cuh files all reference Rv32* types. This is a known deferral — CI is disabled for CUDA.

11. AOT layer assumes 32-bit registers throughout (systemic)

Register stride /4, XMM helpers, *_W AOT impls all use 32-bit assumptions. Known deferral — CI is disabled for AOT.

Code Quality Issues

12. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

13. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes but with_capacity takes element count. Should be / 8.

14. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.

15. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs:60-61`

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use the WORD_SIZE constant for consistency.

Architecture Observations (Positive)

riscv-adapters port is clean and well-structured. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers provide a clean abstraction for the 4-byte pointer + 8-byte register pattern.
Memory address column optimization (commit 7b24bc6) is sound — zero-padding is enforced structurally at the memory bus level, not via explicit AIR assertions.
Bigint transpiler is properly ported — all register strides use RV64_REGISTER_NUM_LIMBS, opcode types use Rv64*.
Non-AOT, non-CUDA execution paths across all extensions are solid — consistent RV64_* constant usage.
*_W adapter/core pattern remains well-designed with proper sign extension.
Decoder crate correctly handles RV64 shift amounts and W-suffix validation.
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment.

Summary

Priority	Issue	Impact	Status
Critical	#5: Deferral AIR uses 4-byte register arrays	Soundness: memory bus mismatch	Open
Critical	#6: Deferral transpiler wrong register stride	Wrong register addresses	Open
Critical	#9: Algebra circuit 4-byte writes + wrong transpiler stride	Partial register writes + wrong addresses	Open
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#2: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#7: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#8: Bigint CUDA tests reference `Rv32*` types	Compilation failure under `cuda`	Trivial rename
Medium	#3-4: load_sign_extend AOT + missing MulOpcode import	Compilation failure under `aot`	Part of AOT port
Medium	#10-11: CUDA + AOT systemic issues	Known deferral (CI disabled)	Tracked
Low	#12: `TEMP[jpw]` marker	Cleanup	Minor
Low	#13: Serializer 8× over-allocation	Wasted memory	Minor
Low	#14: Inner `guest/guest/` directory	Confusing leftover	Cleanup
Low	#15: Hardcoded `8` vs `WORD_SIZE`	Style	Minor

The core RV64 architecture (non-AOT, non-CUDA) is solid. The latest riscv-adapters commit is well-structured, the bigint transpiler is correctly ported, and the memory column optimization is sound. The main remaining risks are:

Deferral and algebra extensions still have critical RV32_REGISTER_NUM_LIMBS → RV64_REGISTER_NUM_LIMBS migration gaps affecting AIR soundness and transpiler correctness
AOT/CUDA paths are systematically broken (acknowledged via CI disable)

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, and transpiler (~20 locations)
Fix algebra extension — update is_eq.rs register write size + transpiler stride
Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) impls
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA, and keccak256 port as separate work items

github-actions · 2026-04-22T18:01:24Z

Claude finished @Tuanlinh12312's task in 6m 55s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (keccak256, bigint, riscv-adapters, reduce memory)
Review deferral, algebra extension changes
Review toolchain and decoder changes
Post final review

Overview

Re-reviewing after the latest commits: feat: rv64 keccak256 (#2724), fix: rv64 reduce memory address columns (#2733), feat: rv64 bigint (#2732), and feat: rv64 riscv-adapters (#2722). The PR is at 24 commits, 100 changed files, +16688/−7702 lines.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: module is active
abstract_compose overflow — Fixed: from_u64(1u64 << ...)
Range checker sizes — Fixed: RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: riscv64im-unknown-none-elf
AOT executor naming (jal_lui, auipc, jalr, hintstore) — Fixed: all now Rv64*Executor
Bigint transpiler stride — Fixed: now uses RV64_REGISTER_NUM_LIMBS

Bugs (Still Open)

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

All behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")] — blocks compilation with those features.

Fix this →

2. Deferral extension: register arrays and transpiler still use RV32 (Soundness bug)

The deferral extension has not been ported to RV64 — every register-related constant is still RV32_REGISTER_NUM_LIMBS (4 bytes instead of 8):

Register arrays (should be RV64_REGISTER_NUM_LIMBS = 8):

output/air.rs:59-60 — rd_val: [T; RV32_REGISTER_NUM_LIMBS], rs_val: [T; RV32_REGISTER_NUM_LIMBS]
output/trace.rs:72-73 — same pattern
call/air.rs:230-231 — same pattern
call/trace.rs:234-235 — same pattern

Range checks (wrong array index, wrong bit width):

output/air.rs:237-251 — accesses [RV32_REGISTER_NUM_LIMBS - 1] = index 3 (should be 7), multiplies by 4 (should be 8)
output/trace.rs:337-346 — same
call/air.rs:306-345 — same
call/trace.rs:177-180, 407-412 — same

Transpiler stride (should be ×8 not ×4):

transpiler/src/lib.rs:85-86, 95-96 — RV32_REGISTER_NUM_LIMBS * dec_insn.rd → register x5 maps to address 20 instead of correct 40

Impact: Memory bus interaction width mismatch with 8-byte registers. AIR constraints unsound. Transpiler generates wrong register addresses.

Fix this →

3. Algebra extension: wrong register write size, type parameters, and transpiler stride (Soundness bug)

Write size (modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register — partial register write.

Type parameters (modular_chip/mod.rs:42, 54, 69):
ModularIsEqualCoreAir parameterized with RV32_REGISTER_NUM_LIMBS (4) as WRITE_LIMBS — should be RV64_REGISTER_NUM_LIMBS (8). Creates structural mismatch between circuit expectations and executor.

Transpiler stride (transpiler/src/lib.rs:102-103, 126, 204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8

Register x1 maps to address 4 instead of correct 8.

Fix this →

4. Bigint AOT impls reference non-existent `Rv32*` executor types (Compilation error under `aot`)

12 AotExecutor/AotMeteredExecutor impl blocks across 6 files use old Rv32* struct names:

File	Lines	Has	Should be
`base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`
`less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`

Fix this →

5. Bigint test files reference non-existent `Rv32VecHeapBranchAdapterExecutor`

extensions/bigint/circuit/src/tests.rs:922, 988 — Uses Rv32VecHeapBranchAdapterExecutor but the type was renamed to Rv64VecHeapBranchAdapterExecutor in the riscv-adapters crate. These are CUDA tests and will fail to compile under cuda feature.

6. Riscv circuit AOT impls use wrong const generic `RV32_REGISTER_NUM_LIMBS` (Dead code)

AOT AotExecutor/AotMeteredExecutor impls for non-W base chips implement for { RV32_REGISTER_NUM_LIMBS } (4 limbs) but the actual executor types use RV64_REGISTER_NUM_LIMBS (8 limbs) — making these impls dead code:

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	21, 41
`base_alu_w/execution.rs`	166, 234

7. CUDA extension references old `Rv32*` types (known deferral)

extension/cuda.rs imports Rv32BaseAluAir, Rv32LoadStoreAir, etc. — all renamed to Rv64*. Bigint CUDA files (cuda/mod.rs, cuda/src/bigint.cu) extensively reference Rv32VecHeapAdapter* types. Keccak256 CUDA files use RV32_REGISTER_NUM_LIMBS for register arrays and width calculations. All CUDA code is broken for RV64 — acknowledged via chore(ci): comment out aot workflows.

Code Quality Issues

8. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s...

9. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes but with_capacity takes element count. Should be / 8.

10. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". Inner directory uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.

Architecture Observations (Positive)

Keccak256 RV64 port (latest commit) is solid — Rust code correctly uses RV64_WORD_NUM_LIMBS for pointer columns, expand_to_rv64_register() for zero-padded 8-byte bus interactions, and rv64_register_to_u32() for pointer extraction. Well-designed.
Bigint RV64 port (latest commit) is correct in non-AOT paths — all execution code properly uses RV64_REGISTER_NUM_LIMBS, RV64_REGISTER_AS, RV64_MEMORY_AS. Transpiler correctly uses RV64_REGISTER_NUM_LIMBS for stride. Only AOT impls have stale naming.
riscv-adapters port is clean and well-structured — pad_reg_val, compose_ptr, tracing_read_reg_ptr helpers correctly handle the 4-byte materialized pointer + 8-byte register pattern. Zero RV32_* references in Rust code.
Reduce-memory-address-columns changes are correct — optimizes column usage by materializing only significant limbs (RV64_WORD_NUM_LIMBS = 4 for pointers) while zero-padding to full register width in bus interactions. Applied cleanly to hintstore, loadstore, and SHA2 chips.
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).
Non-AOT, non-CUDA execution paths across the riscv extension remain solid — consistent RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).

Summary

Priority	Issue	Impact	Status
Critical	#2: Deferral extension fully unported	Unsound AIR constraints, wrong transpiler addresses	Open
Critical	#3: Algebra extension wrong register width + stride	Partial register writes, wrong addresses	Open
High	#1: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#4: Bigint AOT `Rv32` → `Rv64` (12 impls, 6 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#5: Bigint tests `Rv32VecHeapBranch` → `Rv64`	Compilation failure under `cuda`	Trivial rename
High	#6: Riscv AOT impls wrong const generic	Dead code under `aot`	Part of AOT port
Medium	#7: CUDA extension not ported (all extensions)	GPU proving broken	Known deferral
Low	#8: `TEMP[jpw]` marker	Cleanup	Minor
Low	#9: Serializer 8× over-allocation	Wasted memory	Minor
Low	#10: Inner `guest/guest/` directory	Confusing leftover	Cleanup

The core riscv extension, riscv-adapters, bigint (non-AOT), and keccak256 (Rust) are solid. The latest commits demonstrate a correct and consistent RV64 migration pattern with proper pointer materialization (4-byte columns) and zero-padded register bus interactions (8 bytes).

The deferral and algebra extensions are the primary remaining risks — they have critical RV32_REGISTER_NUM_LIMBS usage that produces wrong register array sizes, wrong transpiler addresses, and unsound AIR constraints.

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS across 4 circuit files and transpiler (~30 occurrences)
Fix algebra extension — update is_eq.rs write size, mod.rs type parameters, and transpiler stride
Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix bigint AOT Rv32* → Rv64* in 6 files (trivial rename)
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA, and keccak256 CUDA port as separate work items

github-actions · 2026-04-22T18:02:40Z

Claude finished @Tuanlinh12312's task in 7m 8s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (riscv-adapters, bigint, keccak256, reduce memory)
Review deferral, algebra extension changes
Review toolchain and decoder changes
Post final review

Overview

Re-reviewing after the latest 5 commits: feat: rv64 algebra (#2727), feat: rv64 keccak256 (#2724), fix: rv64 reduce memory address columns (#2733), feat: rv64 bigint (#2732), feat: rv64 riscv-adapters (#2722). The PR is at 25 commits, 366 files changed, +17031/−8160 lines.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

New commits assessment:

riscv-adapters (feat: rv64 riscv-adapters #2722) — Solid. Zero RV32_* references. Clean pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers.
bigint (feat: rv64 bigint #2732) — Execution and circuit code properly uses RV64_* constants. AOT/CUDA stubs still reference old types.
reduce memory address columns (fix: rv64 reduce memory address columns #2733) — Good optimization: stores only RV64_WORD_NUM_LIMBS (4) pointer bytes, pads to 8 in constraints.
keccak256 (feat: rv64 keccak256 #2724) — Properly migrated with rv64_register_to_u32 helper and correct pointer validation.
algebra (feat: rv64 algebra #2727) — Clean migration with explicit debug_assert!(val <= u32::MAX as u64) pointer validation.

Bugs (Still Open)

1. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns, trace records, and memory bridge interactions all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but RV64 registers are 8 bytes wide. The memory bridge reads only 4 bytes from each register, creating an unsound constraint system.

AIR columns (output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276) — passes local.rd_val (4 elements) to memory_bridge.read(). In the RV64 system, register reads should be 8 bytes. The memory interaction size doesn't match what the system memory chip expects.

Address range checks (output/air.rs:237-244, call/air.rs:306-313) use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) for the address bound. For the upper-limb check, local.rd_val[RV32_REGISTER_NUM_LIMBS - 1] accesses index 3 of a 4-element array — but with 8-byte registers, the upper bound check should access index 7.

Impact: Memory bus interaction width mismatch → unsound AIR constraints.

Files affected: extensions/deferral/circuit/src/output/air.rs, output/trace.rs, call/air.rs, call/trace.rs

Fix this →

2. Deferral transpiler uses wrong register stride (Address bug)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 at offset 20 instead of the correct 40 for RV64.

Fix this →

3. Deferral config field naming mismatch

extensions/deferral/circuit/src/extension/mod.rs:228-230:

pub rv32i: Rv64I,   // Field named "rv32i" but type is Rv64I
pub rv32m: Rv64M,   // Field named "rv32m" but type is Rv64M

This is cosmetic (functionally works) but will cause confusion and creates a landmine for anyone adding feature-gated code that checks field names.

4. `config.rv32i` should be `config.rv64i` in riscv extension (5 locations)

Rv64ImConfig has field rv64i, but 5 locations still reference config.rv32i:

File	Line	Feature gate
`extensions/riscv/circuit/src/lib.rs`	297	`cuda`
`extensions/riscv/circuit/src/mul/tests.rs`	289	`cuda`
`extensions/riscv/circuit/src/mulh/tests.rs`	530	`aot`
`extensions/riscv/circuit/src/jalr/tests.rs`	586	`aot`
`extensions/riscv/circuit/src/mul_w/tests.rs`	416	`aot`

Fix this →

5. AOT impls reference non-existent `Rv32*Executor` types (riscv extension)

8 impl blocks across 4 files implement traits for struct names that no longer exist:

File	Lines	Has	Should be
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Dead code under aot feature — will fail to compile when AOT is enabled.

Fix this →

6. AOT impls in riscv extension use wrong const generics — dead code

Non-W chips (base_alu, less_than, divrem) define Rv64*Executor with RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4):

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	20, 41

These impls implement traits for types that never get instantiated in RV64.

7. Bigint AOT impls reference non-existent `Rv32*` types

Same pattern as riscv extension — actual types renamed to Rv64* but AOT impls still use Rv32*:

File	Lines	Has	Should be
`base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`
`branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`

8. Bigint CUDA test code references old types

extensions/bigint/circuit/src/tests.rs — behind #[cfg(feature = "cuda")]:

Lines 655, 716, 782, 847, 905, 971: Reference Rv32*Opcode::CLASS_OFFSET (renamed to Rv64*)
Lines 922, 988: Reference Rv32VecHeapBranchAdapterExecutor (renamed to Rv64*)

9. CUDA extensions not ported (systemic — known deferral)

All CUDA code across riscv, keccak256, sha2, bigint, and deferral extensions still references old Rv32* types and RV32_REGISTER_NUM_LIMBS. CI workflows are disabled for these. Tracked as separate work items.

10. AOT layer assumes 32-bit registers throughout (systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers, and RV32_REGISTER_NUM_LIMBS const generics. CI workflows are disabled. Needs fundamental redesign for 64-bit register values.

Code Quality Issues

11. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

12. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. Should be / 8 or use Vec::new().

13. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

WORD_SIZE is defined as 8 in platform/src/lib.rs:23. Should use the constant.

14. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

Uses hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
Uses cfg(target_os = "zkvm") (old) vs cfg(openvm_intrinsics) (new)
Uses RV32_ALU_OPCODE vs RV64_ALU_OPCODE

15. Bigint pointer truncation lacks explicit safety assertions

In extensions/bigint/circuit/src/base_alu.rs (and other bigint execution files), pointer values are extracted as:

u64::from_le_bytes(rs1_ptr) as u32  // Silent truncation

Unlike the algebra extension which uses debug_assert!(val <= u32::MAX as u64), the bigint code silently truncates. While the VM invariants likely guarantee this is safe, adding explicit debug assertions would be consistent with the algebra extension's approach.

Architecture Observations (Positive)

riscv-adapters port is exemplary — zero RV32_* references, clean helper API (pad_reg_val, compose_ptr, tracing_read_reg_ptr), proper 4-byte pointer storage with 8-byte padding in constraints.
Algebra extension (feat: rv64 algebra #2727) is clean — all RV32_REGISTER_NUM_LIMBS properly migrated to RV64_REGISTER_NUM_LIMBS, explicit pointer validation with debug_assert!.
Keccak256 extension (feat: rv64 keccak256 #2724) is well-migrated — rv64_register_to_u32 helper correctly reads 8-byte registers and validates upper bytes are zero.
Reduce memory columns (fix: rv64 reduce memory address columns #2733) is a good optimization — stores only the necessary 4 pointer bytes in columns, pads to 8 in AIR constraints, saving trace width.
Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.
Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag is cleanly integrated.

Summary

Priority	Issue	Impact	Status
Critical	#1: Deferral AIR/trace uses 4-byte register arrays	Memory bus mismatch, unsound constraints	Open
Critical	#2: Deferral transpiler wrong register stride	Wrong register addresses	Open
High	#4: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#5: AOT `Rv32Executor` → `Rv64Executor` (riscv, 4 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#6: AOT impls wrong const generics (riscv, 4 files)	Dead code under `aot`	Part of AOT port
High	#7: Bigint AOT `Rv32` → `Rv64` (6 files)	Dead code + compilation failure under `aot`	Trivial rename
Medium	#3: Deferral config `rv32i`/`rv32m` field naming	Misleading, functionally correct	Cosmetic fix
Medium	#8: Bigint CUDA test old types	Compilation failure under `cuda`	Part of CUDA port
Medium	#9-10: AOT/CUDA systemic	Known deferral (CI disabled)	Tracked
Low	#11: `TEMP[jpw]` marker	Cleanup	Minor
Low	#12: Serializer 8× over-allocation	Wasted memory	Minor
Low	#13: Hardcoded `8` vs `WORD_SIZE`	Style	Minor
Low	#14: Duplicate `guest/guest/` directory	Confusing leftover	Cleanup
Low	#15: Bigint silent pointer truncation	Missing safety assertion	Consistency

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The latest commits (algebra, keccak256, riscv-adapters, reduce memory columns) are well-structured with proper RV64 constant usage. The adapter pattern in riscv-adapters is particularly clean.

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, range checks, and transpiler (~30 locations across 5 files)
Fix config.rv32i → config.rv64i in 5 riscv locations (trivial)
Fix Rv32*Executor → Rv64*Executor in AOT impls across riscv (4 files) and bigint (6 files) — trivial renames
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA, and deferral CUDA port as separate work items

github-actions · 2026-04-22T18:07:05Z

Claude finished @Tuanlinh12312's task in 6m 0s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (ECC, algebra, keccak256, bigint, riscv-adapters, reduce memory)
Review deferral and algebra extension changes in detail
Review toolchain and decoder changes
Post final review

Overview

Reviewing after the latest commits including feat: rv64 ECC (#2728), feat: rv64 algebra (#2727), feat: rv64 keccak256 (#2724). The PR is at 26 commits, 380 files changed, +17245/−8348 lines.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf
Algebra extension — Fixed: transpiler, execution, and circuit code all correctly use RV64_REGISTER_NUM_LIMBS, RV64_REGISTER_AS, and u64::from_le_bytes(). Zero remaining RV32_REGISTER_NUM_LIMBS or RV32_CELL_BITS references.
ECC extension — Clean: no RV32_ references found
Keccak256 non-CUDA code — Clean: correctly uses RV64_WORD_NUM_LIMBS

Bugs (Still Open)

1. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug — Critical)

The deferral extension's AIR columns, trace records, and memory bridge interactions all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays. Since MemoryReadAuxCols<T> infers its block size from the data array passed to memory_bridge.read(), passing a 4-element array creates a 4-byte memory read — but RV64 registers are 8 bytes wide. This creates a memory bus interaction mismatch.

AIR columns — arrays sized at 4 instead of 8:

File	Line	Field
`deferral/circuit/src/output/air.rs`	59	`rd_val: [T; RV32_REGISTER_NUM_LIMBS]`
`deferral/circuit/src/output/air.rs`	60	`rs_val: [T; RV32_REGISTER_NUM_LIMBS]`
`deferral/circuit/src/call/air.rs`	230	`rd_val: [T; RV32_REGISTER_NUM_LIMBS]`
`deferral/circuit/src/call/air.rs`	231	`rs_val: [T; RV32_REGISTER_NUM_LIMBS]`

Trace records — arrays sized at 4 instead of 8:

File	Line	Field
`deferral/circuit/src/output/trace.rs`	72	`rd_val: [u8; RV32_REGISTER_NUM_LIMBS]`
`deferral/circuit/src/output/trace.rs`	73	`rs_val: [u8; RV32_REGISTER_NUM_LIMBS]`
`deferral/circuit/src/call/trace.rs`	234	`rd_val: [u8; RV32_REGISTER_NUM_LIMBS]`
`deferral/circuit/src/call/trace.rs`	235	`rs_val: [u8; RV32_REGISTER_NUM_LIMBS]`

Memory bridge reads — output/air.rs:260-276 and call/air.rs:356-376 pass local.rd_val (4 elements) to memory_bridge.read(), creating 4-byte memory interactions for 8-byte registers.

Address range checks — output/air.rs:237-244, output/trace.rs:337-346, call/air.rs:306-313, call/trace.rs:177-180,407-412 use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32) and index [RV32_REGISTER_NUM_LIMBS - 1] (= limb[3] instead of limb[7]).

Note: The pointer values are correctly 32-bit. The approach used in riscv-adapters (pad_reg_val — read 8 bytes, use lower 4 as pointer, constrain upper 4 to zero) should be applied here. See extensions/riscv-adapters/src/helpers.rs for the correct pattern.

Fix this →

2. Deferral transpiler uses wrong register stride (Critical)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64.

Fix this →

3. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line
`riscv/circuit/src/lib.rs`	297
`riscv/circuit/src/mul/tests.rs`	289
`riscv/circuit/src/mulh/tests.rs`	530
`riscv/circuit/src/jalr/tests.rs`	586
`riscv/circuit/src/mul_w/tests.rs`	416

Fix this →

4. **RISCV AOT impls reference non-existent `Rv32*Executor` types** (Compilation error under `aot`)

8 impl blocks across 4 files use Rv32* struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

5. **Bigint AOT impls reference non-existent `Rv32*256Executor` types** (Compilation error under `aot`)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

6. **Bigint CUDA references old `Rv32*Air` types** (Compilation error under `cuda`)

extensions/bigint/circuit/src/extension/cuda.rs:55-101 references Rv32BaseAlu256Air, Rv32LessThan256Air, Rv32BranchEqual256Air, Rv32BranchLessThan256Air, Rv32Multiplication256Air, Rv32Shift256Air — all renamed to Rv64*.

7. **RISCV CUDA extension references old `Rv32*` types** (Compilation error under `cuda`)

extensions/riscv/circuit/src/extension/cuda.rs:12-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. Also missing: new *_W chip registrations for GPU backend.

8. AOT layer assumes 32-bit registers throughout (Systemic — known deferral)

The AOT x86 code generation layer is fundamentally broken for RV64: register stride a / 4, XMM helpers (gpr_to_rv32_register, xmm_to_gpr), and *_W AOT impls all use 32-bit assumptions. The CI disabling (chore(ci): comment out aot workflows) acknowledges this.

Code Quality Issues

9. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named.

10. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard, RV32_ALU_OPCODE). Should be removed.

11. Serializer `Vec::with_capacity` over-allocates 8x

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8x memory waste.

12. Deferral config naming (`Rv32Deferral` → should be `Rv64Deferral`)

extensions/deferral/circuit/src/extension/mod.rs:224-250:

Rv32DeferralConfig struct with fields rv32i: Rv64I, rv32m: Rv64M
Rv32DeferralCpuBuilder, Rv32DeferralGpuBuilder

These names are misleading — they contain RV64 types but carry the RV32 prefix.

Architecture Observations (Positive)

Algebra extension is fully and correctly ported — all register operations use RV64_REGISTER_NUM_LIMBS, RV64_REGISTER_AS, and u64::from_le_bytes(). The is_eq.rs write data correctly uses 8-byte arrays. Zero remaining RV32_REGISTER_NUM_LIMBS references.
ECC extension is clean — no RV32_ references found.
riscv-adapters port is solid. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers in helpers.rs provide a clean abstraction for the 4-byte pointer + 8-byte register pattern. Zero RV32_* references.
Keccak256 non-CUDA code is correctly updated — uses RV64_WORD_NUM_LIMBS throughout.
Bigint transpiler is correctly ported — uses RV64_REGISTER_NUM_LIMBS for register stride.
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).
Non-AOT, non-CUDA execution paths across riscv, algebra, ECC, keccak256, sha2, and bigint extensions are solid — consistent RV64_* constant usage.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag replaces old target_os = "zkvm", ELF64 validation, riscv64im-unknown-none-elf target.
Memory address column optimization (commit 7b24bc6) is sound — zero-padding enforced structurally at the memory bus level.

Summary

Priority	Issue	Impact	Status
Critical	#1: Deferral AIR uses 4-byte register arrays (8 array defs, ~20 index/range refs)	Memory bus mismatch, unsound constraints	Open
Critical	#2: Deferral transpiler wrong register stride (4 locations)	Wrong register addresses	Open
High	#3: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#4: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files, 8 impls)	Dead code + compilation failure under `aot`	Trivial rename
High	#5: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files, 12 impls)	Dead code + compilation failure under `aot`	Trivial rename
High	#6-7: Bigint + Riscv CUDA extensions reference old `Rv32*` types	Compilation failure under `cuda`	Known deferral
Medium	#8: AOT layer assumes 32-bit registers (systemic)	Entire AOT path broken	Known deferral (CI disabled)
Low	#9: `TEMP[jpw]` marker	Cleanup	Minor
Low	#10: Inner `guest/guest/` directory	Confusing stale RV32 leftover	Cleanup
Low	#11: Serializer 8x over-allocation	Wasted memory	Minor
Low	#12: Deferral config naming `Rv32Deferral*`	Misleading	Cosmetic

The core RV64 architecture (non-AOT, non-CUDA paths) is solid across all ported extensions. The algebra, ECC, keccak256, riscv-adapters, and toolchain changes are well-structured with correct constant usage. The *_W chip design with split-width adapters is clean.

The deferral extension remains the primary critical risk — it still has RV32_REGISTER_NUM_LIMBS usage that causes AIR soundness issues and wrong transpiler register addresses.

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, memory bridge interactions, range checks, and transpiler stride (~30 locations across 5 files). Follow the riscv-adapters/helpers.rs pattern for pointer handling.
Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) impls (trivial renames)
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA, and deferral config naming as separate work items

github-actions · 2026-04-22T18:08:42Z

Claude finished @Tuanlinh12312's task in 3m 10s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (ECC, pairing, algebra, keccak256, bigint, riscv-adapters, reduce memory)
Review deferral and algebra extension changes in detail
Review toolchain and decoder changes
Post final review

Overview

Re-reviewing after the latest commits feat: rv64 pairing (#2734) (b7308ae), feat: rv64 ECC (#2728) (ccdfcb3), and feat: rv64 algebra (#2727) (2bb3249). The PR is at 27 commits, 382 files changed, +17266/−8372 lines.

Previously fixed issues (confirmed):

Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
Commented-out hintstore — Fixed: hintstore module is now active
abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf
Algebra extension (is_eq, modular_chip, transpiler) — Fixed in feat: rv64 algebra (#2727): now correctly uses RV64_REGISTER_NUM_LIMBS throughout

NEW: ECC, Pairing, Algebra Changes

Algebra Extension — ✅ Well Done

The feat: rv64 algebra (#2727) commit is solid:

modular_chip/is_eq.rs:731: Correctly uses RV64_REGISTER_NUM_LIMBS for write data
modular_chip/mod.rs:42,54,69: Correctly parameterized with RV64_REGISTER_NUM_LIMBS and RV64_CELL_BITS
transpiler/src/lib.rs:102-103,126,204-205: Correctly uses RV64_REGISTER_NUM_LIMBS for register stride
Zero RV32_REGISTER_NUM_LIMBS references remain in non-CUDA algebra code

ECC Extension — ✅ Well Done

The feat: rv64 ECC (#2728) commit is clean:

Transpiler correctly uses RV64_REGISTER_NUM_LIMBS for register stride
Circuit types correctly use Rv64VecHeapAdapterRecord
Zero RV32_REGISTER_NUM_LIMBS references in non-CUDA ECC code

Pairing Extension — ⚠️ Contains Bugs (see #1 below)

Bugs (Still Open)

1. NEW: Pairing transpiler uses wrong register stride (Wrong register addresses)

extensions/pairing/transpiler/src/lib.rs:3,47-48:

use openvm_instructions::{instruction::Instruction, riscv::RV32_REGISTER_NUM_LIMBS, ...};
// ...
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64. The ECC and algebra transpilers were correctly updated — this one was missed.

Fix this →

2. **NEW: Pairing CUDA builder references `Rv32*` types** (Compilation error under `cuda`)

extensions/pairing/circuit/src/cuda.rs has 7+ references to old types:

Line 4: Imports Rv32ModularBuilder (should be Rv64ModularBuilder)
Line 16: Uses Rv32PairingConfig (should be Rv64PairingConfig)
Line 19: Defines Rv32PairingGpuBuilder (should be Rv64PairingGpuBuilder)
Lines 24, 30, 43: More Rv32* references

The lib.rs:21 re-export pub use cuda::Rv32PairingGpuBuilder as Rv64PairingBuilder papers over the issue but will break when the CUDA file is fixed.

3. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension has not been ported to RV64. All register-related operations use RV32_REGISTER_NUM_LIMBS (4) but RV64 registers are 8 bytes wide (DEFAULT_BLOCK_SIZE = 8):

AIR columns — create 4-element arrays for register values:

output/air.rs:59-60: rd_val: [T; RV32_REGISTER_NUM_LIMBS], rs_val: [T; RV32_REGISTER_NUM_LIMBS]
call/air.rs:230-231: Same pattern

Memory bridge reads — pass 4-element arrays to memory_bridge.read():

output/air.rs:260-276: memory_bridge.read(..., local.rd_val, ...) — creates a 4-byte memory read interaction
The system memory chip expects 8-byte register reads (matching DEFAULT_BLOCK_SIZE = 8)

Execution code — reads only 4 bytes from registers:

output/execution.rs:157: u32::from_le_bytes(exec_state.vm_read(RV32_REGISTER_AS, ...)) — infers 4-byte block size from u32
call/execution.rs:173-174: Same pattern

Trace records — 4-byte register value arrays:

output/trace.rs:72-73, call/trace.rs:234-235: [u8; RV32_REGISTER_NUM_LIMBS]

Range checks — access wrong index for upper limb:

output/air.rs:243: local.rd_val[RV32_REGISTER_NUM_LIMBS - 1] = index 3 (should be 7 for RV64)
call/air.rs:312-313: Same pattern

Impact: Memory bus interaction width mismatch: the deferral AIR creates 4-byte register read interactions, but the system memory chip expects 8-byte reads. This will cause proof generation failure or unsound constraints.

Fix this →

4. Deferral transpiler uses wrong register stride (Wrong register addresses)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be addressed at offset 20 instead of the correct 40.

5. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot`)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line
`extensions/riscv/circuit/src/lib.rs`	297
`extensions/riscv/circuit/src/mul/tests.rs`	289
`extensions/riscv/circuit/src/mulh/tests.rs`	530
`extensions/riscv/circuit/src/jalr/tests.rs`	586
`extensions/riscv/circuit/src/mul_w/tests.rs`	416

Fix this →

6. RISCV AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot`)

8 impl blocks across 4 files implement traits for Rv32* struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

7. Bigint AOT impls reference non-existent `Rv32*256Executor` types (Compilation error under `aot`)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`

Fix this →

8. AOT layer assumes 32-bit registers throughout (systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers, and RV32_REGISTER_NUM_LIMBS const generics. CI is disabled via chore(ci): comment out aot workflows.

9. CUDA extensions not ported (systemic — known deferral)

riscv/circuit/src/extension/cuda.rs, bigint/circuit/src/extension/cuda.rs, deferral/circuit/src/extension/cuda.rs, pairing/circuit/src/cuda.rs all reference old Rv32* types.

Code Quality Issues

10. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

11. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard, RV32_ALU_OPCODE). Should be removed.

12. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes element count. Should be / 8.

13. Deferral config/builder naming

Rv32DeferralConfig, Rv32DeferralCpuBuilder, Rv32DeferralGpuBuilder in deferral/circuit/src/extension/mod.rs:224-250 — these wrap Rv64I, Rv64M, Rv64Io types but carry the Rv32 prefix.

Architecture Observations (Positive)

Algebra, ECC, and pairing extension ports (latest 3 commits) are well-structured. The algebra extension correctly uses RV64_REGISTER_NUM_LIMBS throughout its circuit, transpiler, and execution code. ECC transpiler is clean.
riscv-adapters port is clean with good helper abstractions: pad_reg_val, compose_ptr, tracing_read_reg_ptr correctly handle the 4-byte materialized pointer + 8-byte register pattern. Zero RV32_* references.
Memory address column optimization (commit 7b24bc6) is sound — materializes only significant limbs (RV64_WORD_NUM_LIMBS = 4 for pointers) while zero-padding to full register width in bus interactions.
Keccak256 and SHA2 Rust code is clean — no remaining RV32_* references in non-CUDA files.
Non-AOT, non-CUDA execution paths across the riscv extension remain solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.
*_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.

Summary

Priority	Issue	Impact	Status
Critical	#3: Deferral AIR uses 4-byte register arrays	Soundness: memory bus mismatch	Open
Critical	#4: Deferral transpiler wrong register stride	Wrong register addresses	Open
High	#1: Pairing transpiler wrong register stride	Wrong register addresses	New
High	#5: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Open
High	#6: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files)	Dead code + compilation failure under `aot`	Open
High	#7: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files)	Dead code + compilation failure under `aot`	Open
Medium	#2: Pairing CUDA builder `Rv32*` types	Compilation failure under `cuda`	New
Medium	#8-9: AOT/CUDA systemic issues	Known deferral (CI disabled)	Tracked
Low	#10: `TEMP[jpw]` marker	Cleanup	Open
Low	#11: Inner `guest/guest/` directory	Confusing stale RV32 leftover	Open
Low	#12: Serializer 8× over-allocation	Wasted memory	Open
Low	#13: Deferral config naming `Rv32Deferral*`	Misleading naming	Open

The core RV64 architecture (non-AOT, non-CUDA) is solid across the riscv, keccak256, sha2, algebra, ECC, and bigint extensions. The latest commits (algebra, ECC, pairing) are well-structured and demonstrate correct RV64_* constant usage.

The deferral extension remains the primary risk — it has critical RV32_REGISTER_NUM_LIMBS usage that produces wrong register array sizes, wrong execution reads, wrong transpiler addresses, and unsound AIR constraints. The pairing transpiler also has a wrong register stride.

Recommended before merge:

Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS across circuit AIR, trace, execution, and transpiler (~40 occurrences)
Fix pairing transpiler — change RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS (3 occurrences)
Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) impls
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA, and deferral rename as separate work items

github-actions · 2026-04-22T18:13:24Z

Claude finished @Tuanlinh12312's task in 6m 41s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (deferral, pairing)
Review deferral extension in detail
Review circuit and extension changes
Post final review

Overview

Re-reviewing after the latest commits feat: rv64 deferral (#2735) (01ac821) and feat: rv64 pairing (#2734) (b7308ae). The PR is at 28 commits, 100 changed files, +17393/−8303 lines.

Previously critical bugs now FIXED:

Deferral AIR register arrays — Fixed: now uses RV64_WORD_NUM_LIMBS (4) for materialized pointer columns and expand_to_rv64_register() for 8-byte memory bus interactions. Correctly follows the riscv-adapters pattern.
Deferral transpiler stride — Fixed: now uses RV64_REGISTER_NUM_LIMBS (8) for register address computation (transpiler/src/lib.rs:85-86, 95-96).
Deferral config naming — Fixed: renamed to Rv64DeferralConfig, Rv64DeferralCpuBuilder with fields rv64i, rv64m.
Algebra extension — Previously confirmed fixed in feat: rv64 algebra (#2727).

Bugs (Still Open)

1. Pairing transpiler uses `RV32_REGISTER_NUM_LIMBS` (stride 4 instead of 8) (Critical — wrong register addresses)

extensions/pairing/transpiler/src/lib.rs:47-48:

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

Import on line 3 also references RV32_REGISTER_NUM_LIMBS instead of RV64_REGISTER_NUM_LIMBS.

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64. This will cause the phantom instruction to read from wrong register addresses.

Fix this →

2. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File	Line	Feature gate
`riscv/circuit/src/lib.rs`	297	`cuda`
`riscv/circuit/src/mul/tests.rs`	289	`cuda`
`riscv/circuit/src/mulh/tests.rs`	530	`aot`
`riscv/circuit/src/jalr/tests.rs`	586	`aot`
`riscv/circuit/src/mul_w/tests.rs`	416	`aot`

Fix this →

3. **RISCV AOT impls reference non-existent `Rv32*Executor` types** (Compilation error under `aot`)

8 impl blocks across 4 files implement traits for struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`jalr/execution.rs`	97, 185	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`hintstore/execution.rs`	111, 157	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

4. RISCV AOT impls for non-W chips use wrong const generic — dead code

Non-W chips define Rv64*Executor with RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — making them dead code that never gets dispatched:

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	20, 41

Note: The *_W chips' AOT impls also use RV32_REGISTER_NUM_LIMBS (4) but this happens to equal RV64_WORD_NUM_LIMBS (4) — functionally correct but semantically misleading. Should use RV64_WORD_NUM_LIMBS instead.

5. **Bigint AOT impls reference non-existent `Rv32*256Executor` types** (Compilation error under `aot`)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

6. CUDA extensions not ported (Known deferral — CI disabled)

All CUDA code across riscv, bigint, deferral, pairing, keccak256, and sha2 extensions still references old Rv32* types. Specific examples:

riscv/circuit/src/extension/cuda.rs — imports Rv32BaseAluAir, Rv32LoadStoreAir, etc.
bigint/circuit/src/extension/cuda.rs — references Rv32*256Air types
deferral/circuit/src/output/cuda.rs and call/cuda.rs — uses RV32_CELL_BITS instead of RV64_CELL_BITS
pairing/circuit/src/cuda.rs — uses Rv32PairingGpuBuilder, Rv32PairingConfig, Rv32ModularBuilder
pairing/circuit/src/lib.rs:21 — pub use cuda::Rv32PairingGpuBuilder as Rv64PairingBuilder (aliasing wrong name)

This is acknowledged via chore(ci): comment out aot workflows.

7. AOT layer assumes 32-bit registers throughout (Known deferral — CI disabled)

Register stride a / 4, XMM helpers (gpr_to_rv32_register, xmm_to_gpr), and AOT const generics all assume 32-bit register layout. Needs fundamental redesign for 64-bit register values.

Code Quality Issues

8. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

9. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. Should be / 8.

10. Duplicate `openvm-riscv-guest` directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard, RV32_ALU_OPCODE). Should be removed.

11. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use the WORD_SIZE constant defined in platform/src/lib.rs:23.

Architecture Observations (Positive)

Deferral extension port (latest commit 01ac821) is well-done. Follows the established pattern: RV64_WORD_NUM_LIMBS for materialized pointer columns, expand_to_rv64_register() for 8-byte padding in memory bus interactions. Transpiler correctly uses RV64_REGISTER_NUM_LIMBS for register stride.
Pairing extension port (b7308ae) is clean in non-CUDA code. Config correctly uses Rv64PairingConfig with proper field names.
riscv-adapters remain exemplary — zero RV32_* references, clean pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers.
Non-AOT, non-CUDA execution paths across all extensions (riscv, algebra, ECC, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.
Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces old target_os = "zkvm".

Summary

Priority	Issue	Impact	Status
Critical	#1: Pairing transpiler wrong register stride	Wrong phantom register addresses	NEW — trivial fix
High	#2: `config.rv32i` → `config.rv64i` (5 locations)	Compilation failure under `cuda`/`aot`	Trivial fix
High	#3: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files)	Dead code + compilation failure under `aot`	Trivial rename
High	#4: Riscv non-W AOT impls wrong const generic (4 files)	Dead code under `aot`	Part of AOT port
High	#5: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files)	Dead code + compilation failure under `aot`	Trivial rename
Medium	#6: CUDA extensions not ported (all extensions)	GPU proving broken	Known deferral (CI disabled)
Medium	#7: AOT layer assumes 32-bit registers (systemic)	Entire AOT path broken	Known deferral (CI disabled)
Low	#8: `TEMP[jpw]` marker	Cleanup	Minor
Low	#9: Serializer 8× over-allocation	Wasted memory	Minor
Low	#10: Inner `guest/guest/` directory	Confusing stale RV32 leftover	Cleanup
Low	#11: Hardcoded `8` vs `WORD_SIZE`	Style inconsistency	Minor

Progress Since Last Review

The deferral extension — previously the most critical open issue — has been fully and correctly ported to RV64. The new commit follows the established riscv-adapters pattern and eliminates all non-CUDA RV32_* references.

Recommended Before Merge

Fix pairing transpiler — change RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS on lines 3, 47, 48 (trivial, 1 file)
Fix config.rv32i → config.rv64i in 5 locations (trivial)
Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) — trivial renames
Remove inner guest/guest/ directory and TEMP[jpw] marker
Track AOT, CUDA port as separate work items

github-actions · 2026-04-22T18:56:34Z

Claude encountered an error —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (guest lib structs/fns update)
Review deferral, pairing, algebra, bigint extensions
Review riscv circuit and toolchain changes
Post final review

Working on it…

github-actions · 2026-04-22T19:23:31Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-04-22T20:16:52Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-04-22T20:23:04Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Resolves INT-6198, INT-6199, INT-6200. Migrates the `openvm`, `openvm-platform` crates from RV32 to RV64. **Scope note:** This PR only covers the guest-side compilation path (crate definitions, assembly, types, cfg gating). The actual execution logic for the I/O instructions (how the interpreter handles hint/reveal/print operations at runtime) is handled separately in the interpreter PR. Compilation and interpreting with the new openvm toolchain have been tested with guest programs from this repo: https://github.com/axiom-crypto/rv64-minimal-guest-test. All `#[cfg(target_os = "zkvm")]` / `#[cfg(not(target_os = "zkvm"))]` attributes replaced with `#[cfg(openvm_intrinsics)]` / `#[cfg(not(openvm_intrinsics))]`. This decouples guest-code gating from the risc0 custom target OS and instead uses a custom cfg flag injected via rustflags. = "riscv64"` Conditional dependency sections switched from `cfg(target_os = "zkvm")` to `cfg(target_arch = "riscv64")`, and the guest dependency changed from `openvm-rv32im-guest` to `openvm-rv64im-guest`. Added `check-cfg` for `openvm_intrinsics` to suppress warnings. - `WORD_SIZE`: 4 → 8 (`platform/src/lib.rs`) - `STACK_TOP`, `TEXT_START`, `is_guest_memory`: `u32` → `u64` (`platform/src/memory.rs`) - `STACK_TOP` static in entry point: `u32` → `u64` (`openvm/src/lib.rs`) - `sw x0, 1(x0)` → `sd x0, 1(x0)` (fault instruction, `openvm/src/lib.rs`) - `lw sp, 0(sp)` → `ld sp, 0(sp)` (entry point stack pointer load, `openvm/src/lib.rs`) Replaced with glibc implementations compiled for `riscv64 -march=rv64im` (source: [glibc 2.42](https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/string)). Uses 64-bit `ld`/`sd` operations throughout. All hint/reveal/read functions updated to operate on 8-byte units: - `read_u32()` → `read_u64()` - Hint stream reads use `u64` length prefixes (8 bytes instead of 4) - Serde serializer/deserializer operate on `u64` chunks - `hint_buffer_u32!` equivalent updated for 64-bit **Note:** `reveal_bytes32` has **not** been changed — it still reveals 32 bytes (256 bits), but just reads them in 8-byte chunks instead of 4-byte chunks. Removed the PAL ABI stubs (`sys_halt`, `sys_read`, `sys_write`, `sys_rand`, etc.) that were required for `std` compatibility with the risc0 `riscv32im-risc0-zkvm-elf` target. No longer needed since rv64 uses a standard target (`riscv64im-unknown-none-elf`) without a custom OS ABI.

…64im (#2446) Transpiler PRs should be reviewed in this order: #2446, #2447, #2448. Summary of this PR: - Some files related to the decoding of RV32 are copied from https://github.com/GregAC/rrs/tree/main/rrs-lib/src to `crates/toolchain/decoder`. - Everything in `extensions/rv32im/transpiler` has been copied to `extensions/rv64im/transpiler`.

Transpiler PRs should be reviewed in this order: #2446, #2447, #2448. Summary of this PR: - Support for decoding of RV64.

Resolves INT-6044. Transpiler PRs should be reviewed in this order: #2446, #2447, #2448. Summary of this PR: - Code that was hardcoded for RV32 has been changed to RV64. - `rrs-lib` was removed as a dependency, and all crates now use `crates/toolchain/decoder`. - Some tests have been added to test the RV64 transpiler, but these tests are still messy -- this will be fixed in the ticket involving the integration tests for the full toolchain and interpreter flow.

…fix compilation errors (#2457) Resolves INT-6318. Summary: - The `extensions/rv32im` has been moved to `extensions/riscv`. - The old `extensions/rv32im/transpiler` module has been entirely deleted and replaced with the transpiler module created in #2448. - Many objects prefixed with `Rv32` are renamed to start with `Rv64`. A list of these objects can be found in https://docs.google.com/document/d/1mZTHk7n2LY-N1WMrjzn_BZmSYtP4PGnflH97Xjrs9EY/edit?tab=t.0.

Towards INT-6319. Summary: - `base_alu`, `less_than`, and `shift` submodules of `riscv` have been ported from RV32 to RV64 - All types/structs renamed from `Rv32*` to `Rv64*` (e.g., `Rv32BaseAluAdapterAir` → `Rv64BaseAluAdapterAir`) - Operand width changed from `u32`/4 bytes to `u64`/8 bytes throughout (registers, ALU ops, compose/decompose helpers) - `Rv64BaseAluAdapter` (shared by all three submodules) updated for RV64 - New constraints added to enforce bytes 3–7 all equal the sign byte for correct 64-bit sign extension (previously only byte 3 was constrained for 32-bit) - Added tests for the above constraint in `base_alu`, `less_than`, `shift`. - Changed constant `DEFAULT_U8_BLOCK_SIZE` from 4 to 8 (matching RV64 register width) - Commented out unported code in `extension/mod.rs` and `lib.rs` (load/store, branch, jal/jalr, auipc, mul/div — all marked `TEMP: disabled until ported to RV64`) with manual empty-enum trait impls to keep the crate compiling - Updated `test_utils.rs` to use RV64 conventions (`generate_rv64_is_type_immediate`, `rv64_rand_write_register_or_imm`) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…mul_w, divrem_w (#2483) Towards INT-6340, INT-6341, INT-6342, INT-6343. This PR copies the RV32 `base_alu`, `shift`, `mul`, and `divrem` modules into new `base_alu_w`, `shift_w`, `mul_w`, and `divrem_w` modules. These copies will serve as the starting point for implementing the RV64 32-bit word instruction variants. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Ayush Shukla <shuklaayush@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Resolves INT-7523. Should be merged after #2728. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves INT-7524. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- also fixes lint

- also update elfs

) Switches guest dependencies that need atomic CAS (`once_cell::race` in the algebra/ecc macros, `spin::Mutex` in `guest-libs/keccak256`) to `portable-atomic` so they compile on stock `riscv64im-unknown-none-elf`, fixing CI failures across the algebra/ecc/pairing/ff_derive/k256/p256/keccak256 guest tests. The stock target's spec declares `atomic-cas: false`, so rustc cfg-gates `AtomicUsize::compare_exchange` out at the language level; `-C passes=lower-atomic` can't help since it acts on LLVM IR after rustc has already rejected the call. rc.1 avoided this on `riscv32im-risc0-zkvm-elf`, which claims CAS support in its spec; we keep guest programs on the stock upstream triple and patch the dependency instead. Switches `once_cell` (and `spin::Mutex` in `guest-libs/keccak256`) to their `portable-atomic` backend, with `portable-atomic` configured to use `critical-section` as its CAS fallback. `critical-section` is a minimal no_std mutual-exclusion interface: exactly one crate in the tree registers an `Impl` via `critical_section::set_impl!`, and every consumer calls `critical_section::with(...)`; without an impl the call panics at runtime. Off-the-shelf impls target interrupt-driven hardware and disable interrupts via CSR reads/writes, which the openvm VM can't transpile (they map to `unimp` and blow up at runtime). `openvm-platform` now registers a no-op impl, sound because the openvm guest is single-threaded, so CAS reduces to a plain load+store with zero CSR emission. A `SAFETY:` comment flags the single-thread invariant. Guest vs host gating now follows one rule across both code and manifests: - `.rs` files: `cfg(openvm_intrinsics)` (set by the guest build rustflags) for openvm-specific code paths, `cfg(not(feature = "std"))` for std-availability (`no_std`), `cfg(target_os = "none")` for runtime-specific attributes (`no_main`, `openvm::entry!` registration). The `no_std` / `no_main` split matches their semantics: std availability is a package decision, entry-point registration is a runtime/target property. - `Cargo.toml` manifests: `cfg(not(target_os = "none"))` for host-only dep gates (Cargo target predicates only accept built-in cfgs, not `openvm_intrinsics`). The `portable-atomic` CAS polyfill keeps the narrower `cfg(all(target_arch = "riscv64", target_os = "none"))` since the `critical-section` backend is arch-specific. - `--check-cfg=cfg(openvm_intrinsics)` added to guest-build rustflags in `crates/toolchain/build/src/lib.rs` so user guest programs never see the unexpected-cfgs warning. - `[workspace.lints.rust]` at the workspace root declares `openvm_intrinsics`; openvm-* crates inherit via `[lints] workspace = true`. Standalone sub-workspaces (tests/programs, book examples) can't inherit, so they declare it explicitly. - `resolver = "2"` on all standalone `[workspace]` sub-workspaces (tests/programs, book examples, cli/tests). Bare `[workspace]` defaulted to v1, which unified proc-macro features into guest builds and dragged `rand_core/std` and `getrandom/std` into the rv64 target. - Cargo.toml host-only dep gates swapped from the stale-on-rv64 `cfg(not(target_os = "zkvm"))` to `cfg(not(target_os = "none"))`. Three cascading rv64-port bugs, only exercised by the `final_exp_hint` tests: - `extensions/pairing/transpiler/src/lib.rs` used `RV32_REGISTER_NUM_LIMBS` (= 4) to compute the phantom's `a`/`b` operand addresses. On rv64 registers live at 8-byte boundaries in `RV64_REGISTER_AS`, so the phantom read zero. Changed to `RV64_REGISTER_NUM_LIMBS`. - `guest-libs/pairing/src/{bn254,bls12_381}/pairing.rs` constructed `(P.as_ptr() as u32, P.len() as u32)` as a fat pointer. The host phantom reads at offsets `0` and `RV64_REGISTER_NUM_LIMBS` (= 8), but the `(u32, u32)` tuple laid fields at offsets 0 and 4. Changed to `(u64, u64)`. - The adjacent `hint_buffer_chunked(ptr, (N * 12 * 2) / 4)` used the rv32 word-size divisor. Changed to `/ openvm_riscv_guest::HINT_WORD_BYTES`. `hint_store_u32!` to `hint_store_u64!` in `algebra/moduli-macros`. New `openvm_riscv_guest::HINT_WORD_BYTES = 8` replaces `/ 4` and `/ 8` magic numbers. Host-side `SqrtHintSubEx` padding bumped from `.take(4)` to `.take(RV64_REGISTER_NUM_LIMBS)` to match the 8-byte guest read. - `rv32`/`riscv32` to `rv64`/`riscv64` across SDK: `Sdk::riscv32()` and `AppConfig::riscv32()` become `riscv64()`, struct fields `rv32i`/`rv32m` become `rv64i`/`rv64m`, `openvm_riscv32.toml` renamed to `openvm_riscv64.toml` with matching TOML keys. Historical `v1.4/` and `v1.5/` snapshots untouched. - `Rv32Modular*`, `Rv32Weierstrass*`, `Rv32Pairing*` imports in failing guest-lib integration tests renamed to `Rv64*`. - `crates/sdk/guest/fib/src/main.rs`: `reveal_u32` to `reveal_u64`, `fibonacci` return type and locals widened from `u32` to `u64`. All 7 book examples were designed as host-run programs with `openvm/std` force-enabled, which dragged std into rv64 guest builds. Per-example: removed `features = ["std"]` from the `openvm` dep, moved `std` behind an example-local `std = ["openvm/std", ...]` feature, added `default-features = false` to transitive std-leakers (`serde`, `num-bigint`, `alloy-primitives`, `openvm-ruint`, `tiny-keccak`, `hex-literal`), and added the `no_main`/`no_std`/`openvm::entry!` guest-program scaffolding. The `i256` example additionally gets `bytes` with the `extra-platforms` feature and `portable-atomic` with `critical-section` on the riscv64-none target, since `alloy-primitives` transitively pulls `bytes` which uses `compare_exchange` directly. The test was passing `["std"]` to enable ff's `BatchInvert` trait via a `#[cfg(feature = "std")]` block in the guest program. On rv64 stock, activating `feature = "std"` at guest level chains through `openvm/std` and tries to link std, which riscv64im-unknown-none-elf doesn't have. Dropped `["std"]` from the test harness call; the std-only inner block is compiled out (the two `BatchInverter::invert_with_*_scratch` paths still run). Generated init file renamed from `openvm_init_batch_inversion_std.rs` to `openvm_init_batch_inversion.rs` to match the new build call. - Deleted `extensions/rv32im/` (empty pre-rv64 leftover) and an untracked stale `extensions/riscv/guest/guest/` copy with the old rv32 hint-store API. - Removed dead duplicate target-cfg block in `extensions/pairing/circuit/Cargo.toml`. - Fixed stale `cfg(target_os = "zkvm")` gate in `examples/keccak/Cargo.toml`. - Removed duplicate `critical_section::set_impl!` in `crates/toolchain/platform/src/heap/embedded.rs`; the canonical no-op impl now lives in `crates/toolchain/platform/src/critical_section.rs`.

Resolves INT-7566. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves INT-7563.

…, mulh, divrem, jal_lui, jalr, auipc (#2741) Resolves INT-7550. --------- Co-authored-by: Tuanlinh12312 <136139181+Tuanlinh12312@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves INT-7551.

Resolves INT-7553. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves INT-7552. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves INT-7564, INT-7528.

Resolves INT-7554. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves INT-7530.

github-actions · 2026-04-27T14:04:48Z

Claude finished @shuklaayush's task in 7m 29s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new CUDA port commits (feat: rv64 cuda system airs #2737, feat: rv64 cuda base_alu, less_than, shift, branch_eq, branch_lt, mul, mulh, divrem, jal_lui, jalr, auipc #2741, feat: rv64 cuda base_alu_w, shift_w, mul_w, divrem_w #2743, feat: rv64 cuda load_sign_extend #2744, feat: rv64 cuda loadstore #2745, feat: rv64 cuda bigint #2746, feat: rv64 cuda hintstore #2748, feat: rv64 cuda sha2 #2749)
Review cleanup and guest program changes (fix(v2.1-rv64): update guest lib structs/fns to use rv64 prefix #2736, chore(v2.1-rv64): replace zkvm os gating with openvm intrinsics #2738, feat: rv64 extensions cleanup #2739, fix(v2.1-rv64): make all guests no_std and use portable atomics #2740, feat: rv64 more cleanup #2752)
Post final review

Overview

Re-reviewing after 16 new commits since the last review (which covered up to feat: rv64 pairing (#2734)). The PR is now at 43 commits, 100 changed files, +20882/−10689 lines. The new commits include:

CUDA ports (feat: rv64 cuda system airs #2737, feat: rv64 cuda base_alu, less_than, shift, branch_eq, branch_lt, mul, mulh, divrem, jal_lui, jalr, auipc #2741, feat: rv64 cuda base_alu_w, shift_w, mul_w, divrem_w #2743, feat: rv64 cuda load_sign_extend #2744, feat: rv64 cuda loadstore #2745, feat: rv64 cuda bigint #2746, feat: rv64 cuda hintstore #2748, feat: rv64 cuda sha2 #2749) — All CUDA extensions fully ported to RV64
Guest updates (fix(v2.1-rv64): update guest lib structs/fns to use rv64 prefix #2736, chore(v2.1-rv64): replace zkvm os gating with openvm intrinsics #2738, fix(v2.1-rv64): make all guests no_std and use portable atomics #2740) — Guest lib structs use rv64 prefix, cfg(target_os = "zkvm") replaced with cfg(openvm_intrinsics), all guests made no_std
Extensions cleanup (feat: rv64 extensions cleanup #2739, feat: rv64 more cleanup #2752) — Centralized helper functions, removed duplicate code
Misc (chore(v2.1-rv64): replace zkvm os gating with openvm intrinsics #2738) — hashmap guest updated, portable atomics added

Previously reported bugs now FIXED (6 of 10):

✅ config.rv32i → config.rv64i in 5 locations — Fixed in all files
✅ CUDA extension references old Rv32* types — Fixed in all CUDA extension files (riscv, bigint, deferral, pairing, keccak256, sha2)
✅ Deferral extension uses 4-byte register arrays — Fixed in feat: rv64 deferral (#2735)
✅ Deferral transpiler wrong register stride — Fixed
✅ Pairing transpiler wrong register stride — Fixed
✅ Duplicate openvm-riscv-guest directory — Fixed (removed)

CUDA Port Assessment — Well Done

The CUDA port across 8 commits is comprehensive and correct:

extensions/riscv/circuit/src/extension/cuda.rs — All 22+ AIR and ChipGpu types correctly reference Rv64* (Rv64BaseAluAir, Rv64ShiftAir, Rv64LoadStoreAir, etc.). Includes new W-suffix chips (Rv64BaseAluWChipGpu, Rv64ShiftWChipGpu, Rv64MulWChipGpu, Rv64DivRemWChipGpu).
crates/circuits/primitives/cuda/include/primitives/constants.h — Correctly defines both RV32 and RV64 constants: RV64_REGISTER_NUM_LIMBS = 8, RV64_WORD_NUM_LIMBS = 4, RV64_CELL_BITS = 8, RV64_IMM_AS = 0. DEFAULT_BLOCK_SIZE = 8 is correct.
CUDA .cu files — All properly use RV64_REGISTER_NUM_LIMBS and RV64_WORD_NUM_LIMBS. Zero RV32_* constant references in any .cu or .cuh file.
Bigint, deferral, pairing CUDA — All extension CUDA files correctly reference Rv64* types.

Bugs (Still Open)

1. **AOT impls in riscv circuit reference non-existent `Rv32*Executor` types** (Compilation error under `aot` feature)

8 impl blocks across 4 files implement traits for struct names that were renamed to Rv64*. These are dead code — the impls target types that don't exist. Will fail to compile when aot feature is enabled.

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	98, 186	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	112, 158	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

2. AOT impls for non-W riscv chips use wrong const generic — dead code (under `aot` feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — these impls will never be dispatched:

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	21, 41

Note: The *_W chips' AOT impls also use RV32_REGISTER_NUM_LIMBS (4), but this happens to equal RV64_WORD_NUM_LIMBS (4) — functionally correct but semantically misleading.

3. **Bigint AOT impls reference non-existent `Rv32*256Executor` types** (Compilation error under `aot` feature)

12 impl blocks across 6 files implement traits for Rv32*256Executor types that don't exist (renamed to Rv64*256Executor):

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot` feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported. Line 14 imports MulWOpcode from openvm_riscv_transpiler, and crate::common::* (line 19) doesn't re-export MulOpcode. This will fail to compile under the aot feature.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports, or add it to the main import block.

5. AOT layer assumes 32-bit registers throughout (Systemic — known issue)

The entire AOT x86 code generation layer is fundamentally broken for RV64:

Register stride a / 4 hardcoded (e.g., branch_eq/execution.rs:131-132, load_sign_extend/aot.rs:115)
XMM helpers gpr_to_rv32_register and xmm_to_gpr in common/mod.rs:25-85 assume 2 RISC-V registers per XMM
Error messages in update_adapter_heights_asm (common/mod.rs:108-116) say "RV32_REGISTER_AS" and "RV32_MEMORY_AS"

The CI disabling (chore(ci): comment out aot workflows) acknowledges this. The AOT layer needs a fundamental redesign for 64-bit register values since XMM elements are 32-bit.

Code Quality Issues

6. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:354-357:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {
    data.map(|x| x.as_canonical_u32() as u8)
}

Leftover temp function with debug marker — should be cleaned up or properly named before merge.

7. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8× memory waste.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / 8) or just use Vec::new().

Architecture Observations (Positive)

CUDA port is comprehensive and clean. All 8 CUDA commits correctly migrate to RV64 types and constants. The new W-suffix GPU chips are properly registered. Zero Rv32* type references remain in any CUDA-related code (.rs, .cu, .cuh).
Guest program migration is thorough. cfg(openvm_intrinsics) cleanly replaces cfg(target_os = "zkvm") across 119 uses. All benchmark guests properly use no_std. Guest lib functions use rv64 prefix.
feat: rv64 more cleanup (#2752) centralizes tracing_read_reg_ptr in adapters/mod.rs:235-251 with proper pointer_max_bits parameter and debug assertions, eliminating duplicate helpers from riscv-adapters/src/helpers.rs.
Non-AOT execution paths across all extensions (riscv, algebra, ECC, keccak256, sha2, bigint, deferral, pairing) are solid — consistent RV64_* constant usage.
*_W adapter/core pattern remains well-designed with proper sign extension via (result as i32 as i64 as u64).
Constants header (primitives/constants.h) correctly defines both RV32 and RV64 constants for multi-ISA CUDA support, with DEFAULT_BLOCK_SIZE = 8.

Summary

Priority	Issue	Impact	Status
High	#1: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files, 8 impls)	Compilation failure under `aot`	Trivial rename
High	#2: Riscv non-W AOT impls wrong const generic (4 files)	Dead code under `aot`	Part of AOT port
High	#3: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files, 12 impls)	Compilation failure under `aot`	Trivial rename
High	#4: Missing `MulOpcode` import in `mul_w` AOT	Compilation failure under `aot`	Trivial fix
Medium	#5: AOT layer assumes 32-bit registers (systemic)	Entire AOT path broken	Known deferral (CI disabled)
Low	#6: `TEMP[jpw]` marker	Cleanup	Minor
Low	#7: Serializer 8× over-allocation	Wasted memory	Minor

Progress Since Last Review

Major improvements — 6 of 10 previously reported bugs have been fixed:

✅ config.rv32i → config.rv64i (5 locations) — Fixed
✅ CUDA extension old Rv32* types (all extensions) — Fully ported across 8 commits
✅ Deferral extension soundness issues — Fixed
✅ Pairing transpiler wrong stride — Fixed
✅ Duplicate guest/guest/ directory — Removed
✅ Guest lib cfg(target_os = "zkvm") → cfg(openvm_intrinsics) — Fixed

The core RV64 architecture (non-AOT paths) is solid and complete. The CUDA port is comprehensive. The only remaining issues are in the AOT layer, which is acknowledged as deferred with CI disabled.

Recommended Before Merge

Fix Rv32*Executor → Rv64*Executor in riscv AOT impls (4 files, 8 occurrences) — trivial renames
Fix Rv32*256Executor → Rv64*256Executor in bigint AOT impls (6 files, 12 occurrences) — trivial renames
Add missing MulOpcode import in mul_w/execution.rs — 1 line
Remove TEMP[jpw] marker / rename tmp_convert_to_u8s in adapters/mod.rs:354
Track systemic AOT redesign as a separate work item

github-actions · 2026-04-27T14:05:51Z

Claude finished @shuklaayush's task in 6m 22s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new CUDA port commits
Review cleanup and guest program changes
Review remaining code quality issues
Post final review

Overview

Re-reviewing after 15 new commits since the last successful review. The PR is at 44 commits, 100 changed files, +20882/−10689 lines. Major new work:

Full CUDA port (feat: rv64 cuda system airs #2737, feat: rv64 cuda base_alu, less_than, shift, branch_eq, branch_lt, mul, mulh, divrem, jal_lui, jalr, auipc #2741, feat: rv64 cuda base_alu_w, shift_w, mul_w, divrem_w #2743, feat: rv64 cuda load_sign_extend #2744, feat: rv64 cuda loadstore #2745, feat: rv64 cuda bigint #2746, feat: rv64 cuda hintstore #2748, feat: rv64 cuda sha2 #2749) — previously flagged as a known deferral, now complete
Extensions cleanup (feat: rv64 extensions cleanup #2739, feat: rv64 more cleanup #2752) — misc cleanup across all extensions
Guest programs updated (fix(v2.1-rv64): update guest lib structs/fns to use rv64 prefix #2736, chore(v2.1-rv64): replace zkvm os gating with openvm intrinsics #2738, fix(v2.1-rv64): make all guests no_std and use portable atomics #2740) — all guests now no_std with openvm_intrinsics cfg, portable atomics

Previously critical bugs now FIXED:

✅ config.rv32i → config.rv64i — all 5 locations fixed
✅ Deferral AIR register arrays — now uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register() for 8-byte bus interactions
✅ Deferral transpiler wrong register stride — now uses RV64_REGISTER_NUM_LIMBS
✅ Pairing transpiler wrong register stride — fixed
✅ CUDA extension Rv32* type references — all ported to Rv64*
✅ Duplicate guest/guest/ directory — removed
✅ Deferral config naming (Rv32Deferral* → Rv64Deferral*) — fixed

Bugs (Still Open)

1. **AOT impls reference non-existent `Rv32*Executor` types** (Compilation error under `aot` feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	98, 186	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	112, 158	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

These are dead code — the impls apply to types that don't exist. Will fail to compile when aot feature is enabled.

Fix this →

2. AOT impls for non-W chips use wrong const generic `RV32_REGISTER_NUM_LIMBS` (Dead code under `aot`)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — making these impls dead code that never gets dispatched:

File	Lines	Const generic
`base_alu/execution.rs`	177, 245	`{ RV32_REGISTER_NUM_LIMBS }` → should be `{ RV64_REGISTER_NUM_LIMBS }`
`less_than/execution.rs`	128, 247	same
`divrem/execution.rs`	108, 273	same
`load_sign_extend/aot.rs`	21, 41	same

Additionally, base_alu_w/execution.rs lines 166, 234 use RV32_REGISTER_NUM_LIMBS (4) — for W-suffix chips, this happens to equal RV64_WORD_NUM_LIMBS (4), so it's functionally correct but should use RV64_WORD_NUM_LIMBS for semantic clarity.

3. **Bigint AOT impls reference non-existent `Rv32*256Executor` types** (Compilation error under `aot`)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`

Fix this →

4. AOT `common/mod.rs` helper functions assume 32-bit register layout (Systemic AOT issue)

extensions/riscv/circuit/src/common/mod.rs:16 imports RV32_MEMORY_AS, RV32_REGISTER_AS and the helper functions (gpr_to_rv32_register, xmm_to_gpr, update_adapter_heights_asm) are all designed for the 32-bit register-to-XMM mapping (2 registers per XMM via pinsrd/pextrd). For RV64, registers are 64 bits each and would need 1:1 XMM mapping or use pinsrq/pextrq.

The error messages at lines 107-117 also say "RV32_REGISTER_AS" and "RV32_MEMORY_AS".

Similarly, AOT code throughout the loadstore/aot.rs, branch_lt/aot.rs, branch_eq/execution.rs, and all *_W execution files use RV32_REGISTER_AS and RV32_IMM_AS — functionally correct (same numeric values as RV64_* equivalents) but semantically misleading.

This is a known deferral — AOT CI is disabled via TODO(aot): re-enable comments in .github/workflows/extension-tests.yml.

Code Quality Issues

5. `TEMP[jpw]` marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:354:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named.

6. Serializer `Vec::with_capacity` over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8× memory waste.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / core::mem::size_of::<u64>()) or Vec::new().

7. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

WORD_SIZE is defined as 8 in platform/src/lib.rs. Should use the constant for consistency with the rest of the codebase.

8. `Rv32ModularArithmeticOpcode` enum not renamed

extensions/algebra/transpiler/src/lib.rs:21 defines pub enum Rv32ModularArithmeticOpcode — this is referenced in ~30 locations across algebra/circuit/src/ (preflight.rs, execution.rs, modular.rs, hybrid.rs, muldiv.rs, addsub.rs). Functionally correct since it's just an enum name, but inconsistent with the rest of the Rv32 → Rv64 rename.

9. Test error message strings still say "Rv32IM"

4 test files reference "failed to create Rv32IM executor" in their expect() strings:

mul/tests.rs:290, mulh/tests.rs:531, jalr/tests.rs:591, mul_w/tests.rs:417

Cosmetic only — these are behind aot/cuda feature gates.

CUDA Port Assessment (New)

The CUDA port across 8 commits is well done:

✅ extensions/riscv/circuit/src/extension/cuda.rs — correctly uses all Rv64* types, no Rv32* imports
✅ All non-W chip CUDA files (base_alu/cuda.rs, shift/cuda.rs, less_than/cuda.rs, mul/cuda.rs, mulh/cuda.rs, divrem/cuda.rs) — correctly use RV64_REGISTER_NUM_LIMBS and RV64_CELL_BITS
✅ All W-suffix chip CUDA files (base_alu_w/cuda.rs, shift_w/cuda.rs, mul_w/cuda.rs, divrem_w/cuda.rs) — correctly use RV64_WORD_NUM_LIMBS and RV64_CELL_BITS
✅ bigint/circuit/src/extension/cuda.rs — correctly uses Rv64*256Air types
✅ crates/circuits/primitives/cuda/include/primitives/constants.h — correctly defines both RV32 and RV64 constants with RV64_REGISTER_NUM_LIMBS = 8, RV64_WORD_NUM_LIMBS = 4, RV64_CELL_BITS = 8
✅ SHA2, hintstore, loadstore, load_sign_extend CUDA files all correctly ported
✅ Keccak256 and SHA2 extensions — zero remaining RV32_* references in Rust code

Architecture Observations (Positive)

Non-AOT, non-CUDA paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage throughout.
CUDA port follows the correct pattern for all chip types: non-W chips use RV64_REGISTER_NUM_LIMBS (8), W-suffix chips use RV64_WORD_NUM_LIMBS (4).
Guest program migration to no_std with openvm_intrinsics cfg and portable atomics is clean and thorough.
*_W adapter/core pattern remains well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces old target_os = "zkvm".

Summary

Priority	Issue	Impact	Status
High	#1: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files, 8 impls)	Dead code + compilation failure under `aot`	Trivial rename
High	#2: Riscv AOT impls wrong const generic (4 files, 8 impls)	Dead code under `aot`	Part of AOT port
High	#3: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files, 12 impls)	Dead code + compilation failure under `aot`	Trivial rename
Medium	#4: AOT `common/mod.rs` helpers assume 32-bit layout	AOT broken for RV64	Known deferral (CI disabled)
Low	#5: `TEMP[jpw]` marker	Cleanup	Minor
Low	#6: Serializer 8× over-allocation	Wasted memory	Minor
Low	#7: Hardcoded `8` vs `WORD_SIZE`	Style inconsistency	Minor
Low	#8: `Rv32ModularArithmeticOpcode` not renamed	Cosmetic	Minor
Low	#9: Test error message strings say "Rv32IM"	Cosmetic	Minor

Progress Since Previous Reviews

Major progress — 6 of the 10 previously tracked bugs are fixed, including all critical/high-severity ones:

✅ All critical soundness bugs (deferral AIR, deferral transpiler, pairing transpiler) — fixed
✅ config.rv32i compilation issue — fixed
✅ CUDA extension port — complete (previously flagged as "known deferral")
✅ Duplicate guest/guest/ directory — removed
✅ Deferral config naming — fixed

The remaining open items are all confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable).

Recommended Before Merge

Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors even though AOT CI is disabled
Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
Track AOT port as a separate work item (the systemic common/mod.rs helpers + const generic issues need a coordinated fix)

# fix(v2.1-rv64): convert more rv32 references to rv64 Follow-up cleanup that gets the lint job and the `sha2_bench` bench-prove run green on the rv64 branch, plus a few non-AOT polish items called out in review. ## Lint fixes - `extensions/riscv/circuit/src/auipc/tests.rs` — gate `RV64_REGISTER_NUM_LIMBS` behind `#[cfg(feature = "cuda")]`; it's only used inside the CUDA test, so the CPU lint job hit `unused_imports`. - `crates/continuations/src/tests/e2e.rs` — rename stale `Rv32DeferralBuilder` / `Rv32DeferralConfig` → `Rv64DeferralBuilder` / `Rv64DeferralConfig` (3 sites). Fixes the CUDA lint job's `E0432`. - `crates/sdk/src/tests.rs` — `SdkVmConfig::riscv32()` → `::riscv64()` plus doc-comment. - `benchmarks/prove/src/lib.rs` — drop unused `clap::command` import. - `extensions/riscv/circuit/src/{mul,mul_w,jalr,mulh}/tests.rs` — error string `"failed to create Rv32IM executor"` → `"Rv64IM executor"`. ## Rename - `Rv32ModularArithmeticOpcode` → `Rv64ModularArithmeticOpcode` (81 references across `extensions/algebra/{transpiler,circuit}`). Pure rename, no semantic change. ## Cleanup - Remove `tmp_convert_to_u8s` and its `// TEMP[jpw]` marker from `extensions/riscv/circuit/src/adapters/mod.rs` — `pub fn` with zero in-tree callers. - `crates/toolchain/openvm/src/io/mod.rs` — replace hardcoded `8` in `read_vec_by_len` with `openvm_platform::WORD_SIZE`. ## Bug fix - `crates/toolchain/openvm/src/serde/serializer.rs` — `to_vec` was calling `Vec::<u64>::with_capacity(size_of_val(value))`, conflating bytes with u64 elements and over-allocating 8×. Switched to `size_of_val(value).div_ceil(size_of::<u64>())`. Heap waste only, no correctness impact. ## Benchmark ELF rebuilds Rebuilt 16 benchmark guest ELFs as 64-bit RISC-V (the committed binaries were still 32-bit, which made `sha2_bench` fail with `Error: Not a 64-bit ELF` after the transpiler tightened its check): base64_json, bincode, bubblesort, ecrecover, fibonacci, fibonacci_iterative, fibonacci_recursive, keccak256, keccak256_iter, kitchen-sink, pairing, quicksort, rkyv, sha2_bench, sha256, sha256_iter. Not rebuilt — `regex`, `revm_transfer`, `revm_snailtracer` — their dependency closures (`regex-automata`, `tracing-core 0.1.x`) use atomic CAS methods that the `riscv64im-unknown-none-elf` target doesn't expose. The existing `portable-atomic` polyfill in `openvm-platform` works for crates that opt in via a feature flag (`bytes`, `spin`), but neither failing crate exposes one. Tracking as a separate item. ## Verification - `cargo +nightly fmt --all -- --check` clean. - `cargo clippy --workspace --all-targets --tests --features "<CI feature set>" -- -D warnings` clean. - `cargo check --workspace --all-targets --tests` clean.

github-actions · 2026-04-27T15:53:19Z

Claude finished @shuklaayush's task in 4m 42s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes in detail
Review remaining AOT issues
Post final review

Overview

Re-reviewing after the latest commit fix(v2.1-rv64): convert more rv32 references to rv64 (e9af982). The PR is at 45 commits, 100 changed files, +20982/−10779 lines. This commit addresses many of the previously reported issues.

Previously reported bugs now FIXED in this commit:

✅ TEMP[jpw] marker / tmp_convert_to_u8s — Removed from adapters/mod.rs
✅ Serializer 8× over-allocation — Fixed: now uses size_of_val(value).div_ceil(size_of::<u64>())
✅ Hardcoded 8 in io/mod.rs — Fixed: now uses WORD_SIZE constant
✅ Rv32ModularArithmeticOpcode enum — Renamed to Rv64ModularArithmeticOpcode across all algebra files
✅ Test error message strings "Rv32IM" — Fixed to "Rv64IM" in 4 test files
✅ SDK/continuations Rv32Deferral* references — Fixed to Rv64Deferral*
✅ SdkVmConfig::riscv32() — Fixed to riscv64() in SDK tests
✅ commit_to_stdin_fields byte padding — Fixed from 4-byte to 8-byte flat_map

Previously confirmed fixed (earlier commits):

✅ config.rv32i → config.rv64i — all locations
✅ CUDA extensions fully ported to Rv64* types
✅ Deferral extension AIR/transpiler
✅ Pairing transpiler register stride
✅ Algebra transpiler/circuit
✅ Duplicate guest/guest/ directory — removed
✅ Hint stream length encoding
✅ Range checker sizes
✅ Guest toolchain target riscv64im-unknown-none-elf
✅ abstract_compose overflow fix

Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. **AOT impls reference non-existent `Rv32*Executor` types** (Compilation error under `aot` feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	98, 186	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	112, 158	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

2. **Bigint AOT impls reference non-existent `Rv32*256Executor` types** (Compilation error under `aot` feature)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

3. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot` feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported. Neither the direct imports (line 14 imports MulWOpcode only) nor crate::common::* (line 19) provide MulOpcode. Will fail to compile under aot feature.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports block.

4. AOT impls for non-W chips use wrong const generic — dead code (under `aot` feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4):

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	21, 41

These impls implement traits for types that are never instantiated with RV32_REGISTER_NUM_LIMBS.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers (gpr_to_rv32_register, xmm_to_gpr in common/mod.rs:25-85), and RV32_REGISTER_AS/RV32_IMM_AS constants. Error messages in update_adapter_heights_asm also say "RV32_REGISTER_AS". The AOT layer needs a fundamental redesign for 64-bit register values since XMM elements are 32-bit (pinsrd/pextrd).

CI is properly disabled via TODO(aot): re-enable in .github/workflows/extension-tests.yml.

Architecture Observations (Positive)

Latest commit (e9af982) is a thorough cleanup that addresses many previously reported issues — serializer over-allocation, hardcoded constants, opcode enum naming, test strings, SDK/continuations deferral references. Good work.
Non-AOT code paths are clean. Zero RV32_REGISTER_NUM_LIMBS references remain outside of AOT impl blocks and the adapters/mod.rs import line (needed for the RV32_CELL_BITS == RV64_CELL_BITS assertion). All extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) use correct RV64_* constants.
CUDA port is complete. All chip types correctly reference Rv64* AIR types and use RV64_REGISTER_NUM_LIMBS/RV64_WORD_NUM_LIMBS consistently.
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE, io/mod.rs uses WORD_SIZE, guest openvm_intrinsics cfg flag is cleanly integrated, CLI init templates use u64/reveal_u64.
Test improvements in loadstore/tests.rs — the CUDA test now uses pointer_max_bits: 20 with properly sized address spaces, and gen_pointer is replaced with explicit address alignment logic that respects tester.address_bits().

Summary

Priority	Issue	Impact	Status
High	#1: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files, 8 impls)	Compilation failure under `aot`	Trivial rename
High	#2: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files, 12 impls)	Compilation failure under `aot`	Trivial rename
Medium	#3: Missing `MulOpcode` import in `mul_w` AOT	Compilation failure under `aot`	1-line fix
Medium	#4: Non-W AOT impls wrong const generic (4 files)	Dead code under `aot`	Part of AOT port
Medium	#5: AOT layer assumes 32-bit register layout (systemic)	Entire AOT path broken	Known deferral (CI disabled)

Progress Since Previous Reviews

Excellent progress — all previously reported critical and high-severity bugs in the core execution paths have been fixed. The latest commit cleans up many of the remaining code quality issues. The only open items are confined to the AOT layer, which is explicitly marked as not yet ported with CI disabled.

Recommended Before Merge

Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
Add missing MulOpcode import in mul_w/execution.rs — 1 line
Track systemic AOT port as a separate work item

Resolves INT-6048. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-27T19:10:37Z

Claude finished @shuklaayush's task in 6m 13s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (riscv-test-vectors migration)
Review remaining AOT issues
Review code quality and architecture
Post final review

Overview

Re-reviewing after the latest commit feat: migrate riscv-test-vectors from rv32 to rv64 (#2758) (8ab3d7e). The PR is at 46 commits, 662 files changed, +21000/−10797 lines.

Previously reported bugs now FIXED (confirmed):

✅ Hint stream length encoding — uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
✅ Phantom sub-executor naming — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
✅ Commented-out hintstore — module is now active
✅ abstract_compose overflow — uses from_u64(1u64 << ...)
✅ Range checker sizes — properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
✅ Guest toolchain target — updated to riscv64im-unknown-none-elf
✅ config.rv32i → config.rv64i — all 5 locations fixed
✅ CUDA extensions — fully ported to Rv64* types across all extensions
✅ Deferral extension AIR/transpiler — correctly uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register()
✅ Pairing transpiler register stride — fixed
✅ Algebra extension — all register operations use RV64_REGISTER_NUM_LIMBS
✅ Duplicate guest/guest/ directory — removed
✅ Deferral config naming — renamed to Rv64DeferralConfig/Rv64DeferralCpuBuilder
✅ TEMP[jpw] marker — removed
✅ Serializer over-allocation — now uses size_of_val(value).div_ceil(size_of::<u64>())
✅ Hardcoded 8 in io/mod.rs — now uses WORD_SIZE constant
✅ Rv32ModularArithmeticOpcode — renamed to Rv64ModularArithmeticOpcode
✅ Test error message strings — updated to "Rv64IM"
✅ SDK/continuations deferral references — updated to Rv64Deferral*

New Commit Assessment: riscv-test-vectors migration (#2758) — ✅ Clean

The migration from rv32im-test-vectors/ to riscv-test-vectors/ is correct:

Makefile correctly uses XLEN=64, includes rv64ui/rv64um Makefrag, compiles with -march=rv64g -mabi=lp64
riscv_test.h properly defines RVTEST_RV64U (not RV32U), uses .dword (8-byte) for tohost/fromhost
Linker script unchanged — uses generic riscv output arch
CI workflow correctly references riscv-test-vectors/ directory and test_rv64im_riscv_vector_* test names
Test code properly uses rv64ui-p-* skip list and riscv-test-vectors/tests path
Shipped binary rv64ui-p-simple is confirmed 64-bit ELF

Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot` feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	98, 186	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	112, 158	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

2. Bigint AOT impls reference non-existent `Rv32*256Executor` types (Compilation error under `aot` feature)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

3. AOT impls for non-W riscv chips use wrong const generic — dead code (under `aot` feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — making these impls dead code that never gets dispatched:

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	21, 41

Note: The *_W chips' AOT impls also use RV32_REGISTER_NUM_LIMBS (4), which happens to equal RV64_WORD_NUM_LIMBS (4) — functionally correct but semantically misleading.

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot` feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but only imports MulWOpcode (line 14). MulOpcode is not available via crate::common::* either.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The AOT x86 code generation layer is fundamentally designed for 32-bit registers:

Register stride a / 4 hardcoded across all AOT files
XMM helpers (gpr_to_rv32_register, xmm_to_gpr in common/mod.rs) assume 2 RISC-V registers per 64-bit XMM
Error messages say "RV32_REGISTER_AS" and "RV32_MEMORY_AS"

CI is disabled via TODO(aot): re-enable in .github/workflows/extension-tests.yml. The AOT layer needs a fundamental redesign for 64-bit register values.

Code Quality Issues

6. Transpiler tests use 32-bit ELFs that will be rejected by the RV64 ELF loader

crates/toolchain/tests/tests/transpiler_tests.rs has 6 non-#[ignore] tests that load rv32im-* ELF files (confirmed 32-bit via file command), but the ELF loader at elf.rs:81 rejects any non-ELF64 input with "Not a 64-bit ELF":

Test function	ELF file	Feature gate
`test_decode_elf`	`rv32im-empty-program-elf`	None
`test_generate_program`	`rv32im-fib-from-as`, `rv32im-intrin-from-as`	None
`test_rv32im_runtime`	`rv32im-exp-from-as`, `rv32im-fib-from-as`	None
`test_intrinsic_runtime`	`rv32im-intrin-from-as`	None
`test_terminate_prove`	`rv32im-terminate-from-as`	None
`test_rv32im_aot_pure_runtime`	`rv32im-exp-from-as`	`aot`

These tests will fail at runtime if executed. They're not run in CI (the riscv workflow only runs the rv64im_riscv_vector_* tests), but they'll fail for anyone running cargo nextest run -p openvm-toolchain-tests. They should either be updated with equivalent rv64 ELFs, #[ignore]d with a TODO(rv64) comment, or removed.

7. `Rv32ModularFp2Int256Config` struct not renamed

crates/toolchain/tests/tests/transpiler_tests.rs:137 defines pub struct Rv32ModularFp2Int256Config with Rv64I, Rv64M, Rv64Io fields. Cosmetic but inconsistent with the rest of the migration.

8. CUDA library name still says `rv32im`

extensions/riscv/circuit/build.rs:19 — .library_name("tracegen_gpu_rv32im"). Functionally correct (it's just an internal linker name) but inconsistent with the rv32im → riscv rename.

Architecture Observations (Positive)

Non-AOT code paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage. Zero non-AOT RV32_REGISTER_NUM_LIMBS references outside of the constants re-export.
CUDA port is comprehensive and correct. All chip types reference Rv64* AIR types. The primitives/constants.h header correctly defines RV64_REGISTER_NUM_LIMBS = 8, RV64_WORD_NUM_LIMBS = 4, DEFAULT_BLOCK_SIZE = 8.
riscv-test-vectors migration is clean — Makefile targets rv64g, test header uses RVTEST_RV64U, shipped binary is confirmed 64-bit ELF, CI workflow correctly references new paths and test names.
Guest program migration is thorough — cfg(openvm_intrinsics) cleanly replaces cfg(target_os = "zkvm"), all benchmarks use no_std, guest lib functions use rv64 prefix.
ELF loader correctly validates Class::ELF64, uses try_into() for u64→u32 conversions with error handling, and constrains addresses to u32.
Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces target_os = "zkvm".

Summary

Priority	Issue	Impact	Status
High	#1: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files, 8 impls)	Compilation failure under `aot`	Trivial rename
High	#2: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files, 12 impls)	Compilation failure under `aot`	Trivial rename
Medium	#3: Non-W AOT impls wrong const generic (4 files)	Dead code under `aot`	Part of AOT port
Medium	#4: Missing `MulOpcode` import in `mul_w` AOT	Compilation failure under `aot`	1-line fix
Medium	#5: AOT layer assumes 32-bit registers (systemic)	Entire AOT path broken	Known deferral (CI disabled)
Low	#6: Transpiler tests use 32-bit ELFs	Tests broken if run locally	Should `#[ignore]` or remove
Low	#7: `Rv32ModularFp2Int256Config` not renamed	Cosmetic	Minor
Low	#8: CUDA library name `tracegen_gpu_rv32im`	Cosmetic	Minor

Progress Since Previous Reviews

Excellent progress — all previously reported critical and high-severity bugs in the core execution/circuit/CUDA paths have been fixed. 19 of the originally reported issues are now resolved. The only remaining open items are confined to the AOT layer, which is explicitly marked as not yet ported with CI disabled.

Recommended Before Merge

Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
Add missing MulOpcode import in mul_w/execution.rs — 1 line
#[ignore] or remove the 6 transpiler tests that load 32-bit ELFs
Track systemic AOT redesign as a separate work item

Towards INT-6339. AUIPC was zero-extending `pc + (imm << 12)` into the upper 32 bits of rd, but the RV64 spec requires sign extension, causing riscv test `rv64ui-p-auipc` to fail. Add `is_sign_extend` column to constrain the MSB of the 32-bit result and fill the upper 4 bytes accordingly. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ayush Shukla <ayush@axiom.xyz>

github-actions · 2026-04-27T20:59:06Z

Claude finished @shuklaayush's task in 4m 57s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes in detail (AUIPC sign-extend fix, riscv-test-vectors migration)
Review remaining AOT issues
Review code quality and architecture
Post final review

Overview

Re-reviewing after the latest commit fix: sign-extend AUIPC result (#2759) (a50c479). The PR is at 47 commits, 100 changed files, +21105/−10807 lines.

Previously reported bugs now FIXED (confirmed):

✅ Hint stream length encoding — uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
✅ Phantom sub-executor naming — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
✅ Commented-out hintstore — module is now active
✅ abstract_compose overflow — uses from_u64(1u64 << ...)
✅ Range checker sizes — properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
✅ Guest toolchain target — updated to riscv64im-unknown-none-elf
✅ config.rv32i → config.rv64i — all locations fixed
✅ CUDA extensions — fully ported to Rv64* types across all extensions
✅ Deferral extension AIR/transpiler — correctly uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register()
✅ Pairing transpiler register stride — fixed
✅ Algebra extension — all register operations use RV64_REGISTER_NUM_LIMBS
✅ Duplicate guest/guest/ directory — removed
✅ Deferral config naming — renamed to Rv64DeferralConfig/Rv64DeferralCpuBuilder
✅ TEMP[jpw] marker — removed
✅ Serializer over-allocation — now uses size_of_val(value).div_ceil(size_of::<u64>())
✅ Hardcoded 8 in io/mod.rs — now uses WORD_SIZE constant
✅ Rv32ModularArithmeticOpcode — renamed to Rv64ModularArithmeticOpcode
✅ Test error message strings — updated to "Rv64IM"
✅ SDK/continuations deferral references — updated to Rv64Deferral*

New Commit Assessment: AUIPC sign-extend (#2759) — ✅ Sound

The fix correctly sign-extends the AUIPC result from 32 bits to 64 bits. In RV64, AUIPC computes rd = pc + (imm << 12) as a 32-bit operation, and the result must be sign-extended to 64 bits (upper 32 bits = 0xFF×4 if bit 31 is set, 0x00×4 otherwise).

AIR constraint approach is elegant and sound:

New is_sign_extend boolean column (line 37) tracks the sign bit of rd_data[3]
Range check (rd_data[2], 2*rd_data[3] - is_sign_extend * 2^8) forces is_sign_extend = msb(rd_data[3]):
- If is_sign_extend = 0: 2*rd_data[3] must be in [0, 256), so rd_data[3] ∈ [0, 128) — top bit 0 ✓
- If is_sign_extend = 1: 2*rd_data[3] - 256 must be in [0, 256), so rd_data[3] ∈ [128, 256) — top bit 1 ✓
Upper register bytes written as is_sign_extend * 0xFF (line 184-191), correctly producing either 0x00 or 0xFF

CUDA consistency verified — auipc.cu mirrors the exact same range check pattern (lines 44-49) and writes is_sign_extend to the trace (line 54).

Test coverage is thorough — new sign_extend_flag_negative_tests verify that pranking is_sign_extend to the wrong value fails verification for both positive and negative results. Sanity test updated to expect [210, 107, 113, 186, 255, 255, 255, 255] (sign-extended).

JalLui already had the same pattern applied (confirmed in jal_lui/core.rs:80,97-108,131-138) — both LUI and JAL correctly sign-extend their results. New JalLui negative tests also added.

JALR correctly uses zero-extension (expand_to_rv64_register) since rd = pc + 4 where pc < 2^30, meaning bit 31 is never set.

Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot` feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*. Dead code that will fail to compile under aot:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	98, 186	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	112, 158	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

2. Bigint AOT impls reference non-existent `Rv32*256Executor` types (Compilation error under `aot` feature)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

3. AOT impls for non-W riscv chips use wrong const generic — dead code (under `aot` feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — dead code that never gets dispatched:

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	21, 41

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot` feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported anywhere in the file. Neither the direct imports (line 14 imports only MulWOpcode) nor crate::common::* (line 19) provide MulOpcode.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports block.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers (gpr_to_rv32_register, xmm_to_gpr in common/mod.rs:25-85), and RV32_REGISTER_AS/RV32_IMM_AS constants. CI is disabled via TODO(aot): re-enable.

Code Quality Issues

6. Transpiler tests use 32-bit ELFs that will be rejected by the RV64 ELF loader

crates/toolchain/tests/tests/transpiler_tests.rs has 5 non-feature-gated tests that load rv32im-* ELF files (32-bit), but the ELF loader at elf.rs:81 rejects any non-ELF64 input with "Not a 64-bit ELF":

Test	ELF
`test_decode_elf`	`rv32im-empty-program-elf`
`test_generate_program`	`rv32im-fib-from-as`, `rv32im-intrin-from-as`
`test_rv32im_runtime`	`rv32im-exp-from-as`, `rv32im-fib-from-as`
`test_intrinsic_runtime`	`rv32im-intrin-from-as`
`test_terminate_prove`	`rv32im-terminate-from-as`

These won't fail in CI (which only runs test_rv64im_riscv_vector_*), but will fail for anyone running cargo nextest run -p openvm-toolchain-tests. They should either be updated with RV64 equivalents, #[ignore]d with a TODO(rv64) comment, or removed.

7. CUDA library name still says `rv32im`

extensions/riscv/circuit/build.rs:19 — .library_name("tracegen_gpu_rv32im"). Functionally correct (internal linker name) but inconsistent with the rv32im → riscv rename.

Architecture Observations (Positive)

AUIPC sign-extend fix is well-designed. The range-check-based approach to constrain is_sign_extend = msb(rd_data[3]) is efficient (no extra columns beyond the boolean flag) and sound. Both Rust and CUDA trace generation are consistent.
JalLui sign-extend uses the same pattern with an additional rd[3] * (4 * is_jal + is_lui) range check that bounds the JAL result to pc < 2^30 — clean and correct.
Non-AOT code paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage. Zero non-AOT RV32_REGISTER_NUM_LIMBS references outside of the adapters/mod.rs assertion.
CUDA port is comprehensive and correct. All chip types reference Rv64* AIR types with proper RV64_REGISTER_NUM_LIMBS/RV64_WORD_NUM_LIMBS usage. The AUIPC CUDA trace generation correctly mirrors the Rust sign-extend logic.
*_W adapter/core pattern remains well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg cleanly replaces target_os = "zkvm", ELF64 validation, riscv64im-unknown-none-elf target.

Summary

Priority	Issue	Impact	Status
High	#1: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files, 8 impls)	Compilation failure under `aot`	Trivial rename
High	#2: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files, 12 impls)	Compilation failure under `aot`	Trivial rename
Medium	#3: Non-W AOT impls wrong const generic (4 files)	Dead code under `aot`	Part of AOT port
Medium	#4: Missing `MulOpcode` import in `mul_w` AOT	Compilation failure under `aot`	1-line fix
Medium	#5: AOT layer assumes 32-bit registers (systemic)	Entire AOT path broken	Known deferral (CI disabled)
Low	#6: Transpiler tests use 32-bit ELFs	Tests broken if run locally	Should `#[ignore]` or update
Low	#7: CUDA library name `tracegen_gpu_rv32im`	Cosmetic	Minor

Progress Since Previous Reviews

All previously reported critical and high-severity bugs in the core execution/circuit/CUDA paths have been fixed (19 issues resolved). The latest AUIPC sign-extend fix is sound and well-tested. The only remaining open items are confined to the AOT layer, which is acknowledged as not yet ported with CI disabled.

Recommended Before Merge

Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
Add missing MulOpcode import in mul_w/execution.rs — 1 line
#[ignore] or remove the 5 transpiler tests that load 32-bit ELFs — prevents confusion for local developers
Track systemic AOT port as a separate work item

github-actions · 2026-04-27T21:18:45Z

Claude finished @shuklaayush's task in 4m 27s —— View job

Code Review: RV64 Support PR (Re-review after latest sync)

Gather context — check new commits since last review
Review status of previously reported bugs
Review new changes (custom target.json with atomic CAS)
Review remaining AOT issues
Review code quality and architecture
Post final review

Overview

Re-reviewing after the latest commit feat(v2.1-rv64): use a custom target.json with atomic cas (#2760) (a6bc526). The PR is at 48 commits, 100 changed files, +21070/−10797 lines.

Previously reported bugs now FIXED (confirmed):

✅ Hint stream length encoding — uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
✅ Phantom sub-executor naming — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
✅ Commented-out hintstore — module is now active
✅ abstract_compose overflow — uses from_u64(1u64 << ...)
✅ Range checker sizes — properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
✅ Guest toolchain target — updated to custom riscv64im-openvm-none-elf
✅ config.rv32i → config.rv64i — all locations fixed
✅ CUDA extensions — fully ported to Rv64* types across all extensions
✅ Deferral extension AIR/transpiler — correctly uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register()
✅ Pairing transpiler register stride — fixed
✅ Algebra extension — all register operations use RV64_REGISTER_NUM_LIMBS
✅ Duplicate guest/guest/ directory — removed
✅ Deferral config naming — renamed to Rv64DeferralConfig/Rv64DeferralCpuBuilder
✅ TEMP[jpw] marker — removed
✅ Serializer over-allocation — fixed with size_of_val(value).div_ceil(size_of::<u64>())
✅ Hardcoded 8 in io/mod.rs — now uses WORD_SIZE constant
✅ Rv32ModularArithmeticOpcode — renamed to Rv64ModularArithmeticOpcode
✅ Test error message strings — updated to "Rv64IM"
✅ SDK/continuations deferral references — updated to Rv64Deferral*
✅ AUIPC sign-extend — correctly sign-extends 32-bit result to 64-bit
✅ riscv-test-vectors migrated from rv32 to rv64

New Commit Assessment: Custom Target JSON with Atomic CAS (#2760) — ✅ Well Done

This commit replaces the stock riscv64im-unknown-none-elf target with a custom riscv64im-openvm-none-elf that declares "atomic-cas": true and "max-atomic-width": 64. Since the guest is single-threaded and the -C passes=lower-atomic LLVM pass converts all atomic ops to non-atomic equivalents at compile time, this is sound.

Benefits are significant:

Eliminates the portable-atomic dependency and its critical-section feature from the guest dependency graph
Removes per-crate bytes/extra-platforms and spin/portable_atomic workarounds from 3 benchmark guests + 1 example
Simplifies once_cell usage in algebra/ecc guests (no longer needs portable-atomic feature)
critical-section is now properly optional, only pulled when heap-embedded-alloc is active
The no-op critical_section impl is co-located with embedded-alloc in heap/embedded.rs instead of being a standalone module — cleaner ownership

Target JSON spec is correct:

"arch": "riscv64" + "features": "+m" — matches rv64im
"singlethread": true — enables LLVM single-thread optimizations
"panic-strategy": "abort" — matches the -C panic=abort rustc flag
"os": "none" — bare metal target
"llvm-abiname": "lp64" — correct for rv64im without F/D extensions
"crt-objects-fallback": "false" — correct for no_std with -Z build-std

Build integration is correct:

RUST_TARGET_PATH set to CARGO_MANIFEST_DIR (where the JSON lives)
-Z unstable-options added to rustc flags (required for custom JSON targets)
RUSTC_TARGET constant updated from riscv64im-unknown-none-elf to riscv64im-openvm-none-elf
Docs updated to reflect new target path

One minor observation: The cfg guard in platform/Cargo.toml:16 uses cfg(all(target_arch = "riscv64", target_os = "none")) which matches the custom target (arch=riscv64, os=none). This is correct — the cfg attributes don't depend on the target triple name, only on the resolved target properties.

Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot` feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*. Dead code that will fail to compile under aot:

File	Lines	Has	Should be
`riscv/circuit/src/jal_lui/execution.rs`	93, 184	`Rv32JalLuiExecutor`	`Rv64JalLuiExecutor`
`riscv/circuit/src/auipc/execution.rs`	84, 167	`Rv32AuipcExecutor`	`Rv64AuipcExecutor`
`riscv/circuit/src/jalr/execution.rs`	98, 186	`Rv32JalrExecutor`	`Rv64JalrExecutor`
`riscv/circuit/src/hintstore/execution.rs`	112, 158	`Rv32HintStoreExecutor`	`Rv64HintStoreExecutor`

Fix this →

2. Bigint AOT impls reference non-existent `Rv32*256Executor` types (Compilation error under `aot` feature)

12 impl blocks across 6 files:

File	Lines	Has	Should be
`bigint/circuit/src/base_alu.rs`	88, 132	`Rv32BaseAlu256Executor`	`Rv64BaseAlu256Executor`
`bigint/circuit/src/branch_eq.rs`	81, 124	`Rv32BranchEqual256Executor`	`Rv64BranchEqual256Executor`
`bigint/circuit/src/branch_lt.rs`	86, 129	`Rv32BranchLessThan256Executor`	`Rv64BranchLessThan256Executor`
`bigint/circuit/src/less_than.rs`	84, 127	`Rv32LessThan256Executor`	`Rv64LessThan256Executor`
`bigint/circuit/src/mult.rs`	72, 115	`Rv32Multiplication256Executor`	`Rv64Multiplication256Executor`
`bigint/circuit/src/shift.rs`	85, 128	`Rv32Shift256Executor`	`Rv64Shift256Executor`

Fix this →

3. AOT impls for non-W riscv chips use wrong const generic — dead code (under `aot` feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — dead code that never gets dispatched:

File	Lines
`base_alu/execution.rs`	177, 245
`less_than/execution.rs`	128, 247
`divrem/execution.rs`	108, 273
`load_sign_extend/aot.rs`	21, 41

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot` feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported anywhere in the file. Will fail to compile under aot feature.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports block.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers (gpr_to_rv32_register, xmm_to_gpr), and RV32_REGISTER_AS/RV32_IMM_AS constants. CI is disabled via TODO(aot): re-enable. Needs fundamental redesign for 64-bit register values.

Code Quality Issues

6. Transpiler tests use 32-bit ELFs that will be rejected by the RV64 ELF loader

crates/toolchain/tests/tests/transpiler_tests.rs has 5 non-feature-gated tests that load rv32im-* ELF files (32-bit), but the ELF loader rejects non-ELF64 with "Not a 64-bit ELF":

Test	ELF
`test_decode_elf`	`rv32im-empty-program-elf`
`test_generate_program`	`rv32im-fib-from-as`, `rv32im-intrin-from-as`
`test_rv32im_runtime`	`rv32im-exp-from-as`, `rv32im-fib-from-as`
`test_intrinsic_runtime`	`rv32im-intrin-from-as`
`test_terminate_prove`	`rv32im-terminate-from-as`

These won't fail in CI (which only runs test_rv64im_riscv_vector_*), but will fail for local developers running cargo nextest run -p openvm-toolchain-tests. They should be #[ignore]d with a TODO(rv64) comment or updated with RV64 equivalents.

7. CUDA library name still says `rv32im`

extensions/riscv/circuit/build.rs:19 — .library_name("tracegen_gpu_rv32im"). Functionally correct (internal linker name) but inconsistent with the rv32im → riscv rename.

Architecture Observations (Positive)

Custom target JSON approach is elegant — declaring "atomic-cas": true at the target level combined with lower-atomic at the LLVM pass level eliminates the need for portable-atomic across the entire dependency graph. This is the cleanest solution for single-threaded no-std targets needing CAS-dependent crates.
Non-AOT code paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage. Zero non-AOT RV32_REGISTER_NUM_LIMBS references outside of the adapters re-export.
CUDA port is comprehensive and correct. All chip types reference Rv64* AIR types with proper RV64_REGISTER_NUM_LIMBS/RV64_WORD_NUM_LIMBS usage. Zero Rv32* references in CUDA code.
Critical section impl correctly co-located with embedded-alloc in heap/embedded.rs — only compiled when the heap feature is active, eliminating unnecessary compilation of critical-section for non-heap guest programs.
*_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).
Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).
Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces target_os = "zkvm", riscv64im-openvm-none-elf custom target with proper atomic lowering.
AUIPC/JalLui sign-extend is sound — range-check-based constraint on is_sign_extend boolean, with matching CUDA trace generation.

Summary

Priority	Issue	Impact	Status
High	#1: Riscv AOT `Rv32Executor` → `Rv64Executor` (4 files, 8 impls)	Compilation failure under `aot`	Trivial rename
High	#2: Bigint AOT `Rv32256Executor` → `Rv64256Executor` (6 files, 12 impls)	Compilation failure under `aot`	Trivial rename
Medium	#3: Non-W AOT impls wrong const generic (4 files)	Dead code under `aot`	Part of AOT port
Medium	#4: Missing `MulOpcode` import in `mul_w` AOT	Compilation failure under `aot`	1-line fix
Medium	#5: AOT layer assumes 32-bit registers (systemic)	Entire AOT path broken	Known deferral (CI disabled)
Low	#6: Transpiler tests use 32-bit ELFs	Broken if run locally	Should `#[ignore]` or update
Low	#7: CUDA library name `tracegen_gpu_rv32im`	Cosmetic	Minor

Progress Since Previous Reviews

All previously reported critical and high-severity bugs in the core execution/circuit/CUDA paths have been fixed (19+ issues resolved across the review cycle). The latest commit cleanly eliminates portable-atomic complexity via a well-designed custom target JSON. The only remaining open items are confined to the AOT layer, which is explicitly acknowledged as not yet ported with CI disabled.

Recommended Before Merge

Fix Rv32*Executor → Rv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
Add missing MulOpcode import in mul_w/execution.rs — 1 line
#[ignore] or remove the 5 transpiler tests that load 32-bit ELFs — prevents confusion for local developers
Track systemic AOT port as a separate work item

github-actions · 2026-04-27T21:24:12Z

group	app.proof_time_ms	app.cycles	leaf.proof_time_ms
fibonacci	1,874	4,000,051	532
keccak	13,734	14,365,133	2,259
sha2_bench	9,356	11,167,961	1,246
regex	1,580	4,090,656	375
ecrecover	648	112,210	293
pairing	762	592,827	282
kitchen_sink	2,084	1,979,971	431

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: a6bc526

Benchmark Workflow

shuklaayush force-pushed the develop-v2.1.0-rv64 branch from bc66c89 to 7672360 Compare April 20, 2026 18:55

branch-rebase-bot Bot force-pushed the develop-v2.0.0-rc.1 branch from dada909 to e4aab20 Compare April 24, 2026 19:54

876pol and others added 8 commits April 27, 2026 16:02

feat: add io.rs to toolchain (#2444)

268b8e3

feat: update rrs-lib to rv64 (#2447)

04120e8

Transpiler PRs should be reviewed in this order: #2446, #2447, #2448. Summary of this PR: - Support for decoding of RV64.

Tuanlinh12312 and others added 17 commits April 27, 2026 16:03

feat: rv64 pairing (#2734)

97988ac

Resolves INT-7523. Should be merged after #2728. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: rv64 deferral (#2735)

54d99e5

Resolves INT-7524. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(v2.1-rv64): update guest lib structs/fns to use rv64 prefix (#2736)

85fabbf

- also fixes lint

chore: bump rustls-webpki

348686b

chore(v2.1-rv64): replace zkvm os gating with openvm intrinsics (#2738)

4b7c212

- also update elfs

chore(v2.1-rv64): update hashmap guest to be no_std and use hashbrown

ad9a752

feat: rv64 extensions cleanup (#2739)

883f867

Resolves INT-7566. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: rv64 cuda system airs (#2737)

82232d4

Resolves INT-7563.

feat: rv64 cuda base_alu_w, shift_w, mul_w, divrem_w (#2743)

fa61451

Resolves INT-7551.

feat: rv64 cuda load_sign_extend (#2744)

9eda39c

Resolves INT-7553. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: rv64 cuda loadstore (#2745)

560f5fd

Resolves INT-7552. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: rv64 cuda bigint (#2746)

04e9aeb

Resolves INT-7564, INT-7528.

feat: rv64 cuda hintstore (#2748)

f29cc7e

Resolves INT-7554. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: rv64 cuda sha2 (#2749)

507dc96

Resolves INT-7530.

feat: rv64 more cleanup (#2752)

56e4f86

shuklaayush force-pushed the develop-v2.1.0-rv64 branch from 3bd85a1 to 56e4f86 Compare April 27, 2026 14:04

chore: cargo fmt

ad2b98c

feat: migrate riscv-test-vectors from rv32 to rv64 (#2758)

8ab3d7e

Resolves INT-6048. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(v2.1-rv64): use a custom target.json with atomic cas (#2760)

a6bc526

Conversation

shuklaayush commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: RV64 Support PR

Overview

Bugs

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

2. Hint stream length encoding mismatch (Semantic bug)

3. CUDA extension not ported (Compilation error under cuda feature)

Code Quality Issues

4. Phantom sub-executor naming inconsistency

5. AoT code uses RV32_REGISTER_AS / RV32_MEMORY_AS

6. Duplicate openvm-riscv-guest Cargo.toml

7. Commented-out hintstore module

Architecture Observations

Summary

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: RV64 Support PR (Re-review after sync)

Overview

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda and aot features)

2. AOT code uses RV32_REGISTER_NUM_LIMBS (4) instead of RV64_REGISTER_NUM_LIMBS (8) (Dead trait impls)

3. AOT tests read only 4 bytes from 8-byte registers

4. CUDA extension not ported (Compilation error under cuda feature)

5. a_reg = a / 4 hardcodes 4-limb register stride in AOT code

Code Quality Issues

6. RV32_REGISTER_AS / RV32_MEMORY_AS in non-AOT code

7. Duplicate openvm-riscv-guest directory

8. ELF memory_image stores data in 4-byte chunks

Architecture Observations (Positive)

Summary

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: RV64 Support PR (Re-review after sync)

Overview

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

2. AOT load_sign_extend impl is dead code — wrong const generic

3. AOT register stride hardcodes /4 instead of /8

4. AOT metered code uses RV32_REGISTER_AS in RV64 *_W chips

5. CUDA extension references old Rv32* types

Code Quality Issues

6. Duplicate openvm-riscv-guest directory

7. RV32_CELL_BITS in auipc/execution.rs:100

Architecture Observations (Positive)

Summary

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: RV64 Support PR (Re-review after latest sync)

Overview

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

2. AOT load_sign_extend impl is dead code — wrong const generic

3. AOT layer assumes 32-bit registers throughout

4. CUDA extension references old Rv32* types

Code Quality Issues

5. Serializer Vec::with_capacity over-allocates

6. Hardcoded 8 instead of WORD_SIZE in io/mod.rs

7. Duplicate openvm-riscv-guest directory

8. cargo doc disabled in CI

Architecture Observations (Positive)

Summary

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: RV64 Support PR (Re-review after latest sync)

Overview

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

2. AOT impls reference non-existent Rv32* executor types (Compilation error under aot feature)

3. AOT load_sign_extend impl uses wrong const generic — dead code

4. AOT register stride hardcodes /4 instead of /8

5. CUDA extension references old Rv32* types

6. *_W AOT impls use RV32_REGISTER_NUM_LIMBS for generic param

github-actions Bot commented Apr 20, 2026 •

edited

Loading

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

3. CUDA extension not ported (Compilation error under `cuda` feature)

5. AoT code uses `RV32_REGISTER_AS` / `RV32_MEMORY_AS`

6. Duplicate `openvm-riscv-guest` Cargo.toml

github-actions Bot commented Apr 20, 2026 •

edited

Loading

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` and `aot` features)

2. AOT code uses `RV32_REGISTER_NUM_LIMBS` (4) instead of `RV64_REGISTER_NUM_LIMBS` (8) (Dead trait impls)

4. CUDA extension not ported (Compilation error under `cuda` feature)

5. `a_reg = a / 4` hardcodes 4-limb register stride in AOT code

6. `RV32_REGISTER_AS` / `RV32_MEMORY_AS` in non-AOT code

7. Duplicate `openvm-riscv-guest` directory

8. ELF `memory_image` stores data in 4-byte chunks

github-actions Bot commented Apr 20, 2026 •

edited

Loading

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

2. AOT `load_sign_extend` impl is dead code — wrong const generic

3. AOT register stride hardcodes `/4` instead of `/8`

4. AOT metered code uses `RV32_REGISTER_AS` in RV64 `*_W` chips

5. CUDA extension references old `Rv32*` types

6. Duplicate `openvm-riscv-guest` directory

7. `RV32_CELL_BITS` in `auipc/execution.rs:100`

github-actions Bot commented Apr 20, 2026 •

edited

Loading

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

2. AOT `load_sign_extend` impl is dead code — wrong const generic

4. CUDA extension references old `Rv32*` types

5. Serializer `Vec::with_capacity` over-allocates

6. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

7. Duplicate `openvm-riscv-guest` directory

8. `cargo doc` disabled in CI

github-actions Bot commented Apr 20, 2026 •

edited

Loading

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda` feature)

2. AOT impls reference non-existent `Rv32*` executor types (Compilation error under `aot` feature)

3. AOT `load_sign_extend` impl uses wrong const generic — dead code

4. AOT register stride hardcodes `/4` instead of `/8`

5. CUDA extension references old `Rv32*` types

6. `*_W` AOT impls use `RV32_REGISTER_NUM_LIMBS` for generic param

7. Serializer `Vec::with_capacity` over-allocates

8. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

9. Duplicate `openvm-riscv-guest` directory

10. `TEMP[jpw]` marker in production code

11. 202 remaining `RV32_*` constant references across 50 files

12. `auipc/tests.rs:419` uses `RV32_REGISTER_NUM_LIMBS` (value bug)

github-actions Bot commented Apr 20, 2026 •

edited

Loading

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

2. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot` feature)

3. AOT `load_sign_extend` impl uses wrong const generic — dead code

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot`)

5. CUDA extension references old `Rv32*` types (Compilation error under `cuda`)

6. Serializer `Vec::with_capacity` over-allocates 8×

7. Hardcoded `8` instead of named constant in `io/mod.rs`

8. Duplicate `openvm-riscv-guest` directory

9. `TEMP[jpw]` marker in production code

10. `RV32_` constant naming in `_W` and execution AOT code

github-actions Bot commented Apr 20, 2026 •

edited

Loading

1. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

2. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot` feature)

3. AOT `load_sign_extend` impl uses wrong const generic — dead code

4. Missing `MulOpcode` import in `mul_w/execution.rs` (Compilation error under `aot`)

6. CUDA extension references old `Rv32*` types (Compilation error under `cuda`)

New Observations on Latest Commit (`2d0588c`)

7. `riscv64im-unknown-none-elf` target requires nightly `-Z build-std`

8. `openvm_intrinsics` cfg flag is well-integrated

9. Duplicate `openvm-riscv-guest` directory

10. Serializer `Vec::with_capacity` over-allocates 8×

11. Hardcoded `8` instead of `WORD_SIZE` in `io/mod.rs`

12. `TEMP[jpw]` marker in production code

github-actions Bot commented Apr 21, 2026 •

edited

Loading

5. `config.rv32i` should be `config.rv64i` (Compilation error under `cuda`/`aot` features)

6. AOT impls reference non-existent `Rv32*Executor` types (Compilation error under `aot`)