Skip to content

test(DON'T MERGE): develop-v2.1.0 rv64#2719

Draft
shuklaayush wants to merge 48 commits intodevelop-v2.0.0-rc.1from
develop-v2.1.0-rv64
Draft

test(DON'T MERGE): develop-v2.1.0 rv64#2719
shuklaayush wants to merge 48 commits intodevelop-v2.0.0-rc.1from
develop-v2.1.0-rv64

Conversation

@shuklaayush
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Claude finished @shuklaayush's task in 5m 13s —— View job


Code Review: RV64 Support PR

  • Gather context and understand PR scope
  • Review new decoder crate
  • Review toolchain changes (platform, openvm, transpiler)
  • Review RV64 circuit/extension changes
  • Review test changes
  • Review Cargo.toml / dependency changes
  • Post final review

Overview

This PR adds RV64IM support to OpenVM across 302 files. The major changes are:

  1. New openvm-decoder crate — RISC-V instruction decoder forked from rrs-lib, extended with RV64 instructions (LD/SD/LWU, *W-suffix ops, M-extension W ops)
  2. Rename rv32imriscv — The extension crate moves from extensions/rv32im/ to extensions/riscv/
  3. 64-bit register width — Registers go from 4 limbs (32-bit) to 8 limbs (64-bit), adapters and chips updated accordingly
  4. New *_W chipsbase_alu_w, shift_w, mul_w, divrem_w for RV64 word-width (32-bit) operations with sign-extension
  5. Toolchain updates — Platform, openvm, serializer, transpiler all updated for 64-bit words
  6. New transpiler tests — RV64 ELF decode and transpile tests with hand-written assembly programs

Bugs

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

extensions/riscv/circuit/src/lib.rs:299 — The Rv64ImGpuBuilder::create_chip_complex references config.rv32i, but the field was renamed to rv64i on line 103. This will fail to compile when the cuda feature is enabled.

Same issue in test files:

  • extensions/riscv/circuit/src/mul/tests.rs:289
  • extensions/riscv/circuit/src/mulh/tests.rs:530
  • extensions/riscv/circuit/src/jalr/tests.rs:586
  • extensions/riscv/circuit/src/mul_w/tests.rs:419

Fix this →

2. Hint stream length encoding mismatch (Semantic bug)

extensions/riscv/circuit/src/extension/mod.rs:988-996Rv32HintInputSubEx writes the hint length as 4 bytes (u32::to_le_bytes()) and pads data to 4-byte alignment. But on the guest side (crates/toolchain/openvm/src/io/mod.rs:39-46), read_u64() consumes 8 bytes from the hint stream. The host emulation in host.rs:47 correctly writes 8 bytes (u64::to_le_bytes()).

In zkVM execution, read_u64() would read the 4-byte length + the first 4 bytes of actual data as a single u64, corrupting both the length and the data stream.

Fix: Change line 989 to (hint.len() as u64).to_le_bytes() and line 995 to .div_ceil(8) * 8 for 8-byte alignment.

Fix this →

3. CUDA extension not ported (Compilation error under cuda feature)

extensions/riscv/circuit/src/extension/cuda.rs still references all Rv32* types (Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, etc.) which no longer exist in this crate. The entire file will fail to compile when cuda feature is enabled. It also doesn't register the new *_W chips.

This is likely intentional for this development branch (given it's marked "DON'T MERGE"), but should be tracked.


Code Quality Issues

4. Phantom sub-executor naming inconsistency

Rv32HintInputSubEx, Rv32HintRandomSubEx, Rv32PrintStrSubEx — These are used by the RV64 extension but still carry the Rv32 prefix. They should be renamed to Rv64* or a generic name (e.g., HintInputSubEx) since they now operate on RV64 register widths (note: Rv32HintRandomSubEx calls read_rv64_register).

5. AoT code uses RV32_REGISTER_AS / RV32_MEMORY_AS

extensions/riscv/circuit/src/common/mod.rs:16,107-116 — The AoT assembly generation code references RV32_REGISTER_AS and RV32_MEMORY_AS. These happen to have the same numeric values as their RV64_* counterparts (both are 1 and 2), so this is functionally correct but misleading.

6. Duplicate openvm-riscv-guest Cargo.toml

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The workspace correctly points to the outer one, but the inner directory (guest/guest/) appears to be a leftover from the RV32 → RV64 migration. The inner guest/guest/ uses the old cfg(not(target_os = "zkvm")) guard while the outer uses the correct cfg(not(any(target_arch = "riscv64"))). Consider removing the inner directory to avoid confusion.

7. Commented-out hintstore module

extensions/riscv/circuit/src/lib.rs:29 — The hintstore module is commented out (// mod hintstore;), and the Rv64Io extension has empty trait implementations. This means hint store operations (writing multiple words from the hint stream) won't work in the zkVM. This is presumably known, but it blocks end-to-end guest program execution since read_vec() depends on hint_buffer_chunkedhint_buffer_u64!HINT_FUNCT3 opcode which needs hintstore.


Architecture Observations

  • The adapter/core pattern is well maintained through the migration. New *_W adapters (Rv64BaseAluWAdapter, Rv64MultWAdapter) correctly expose only the lower 32-bit limbs to core AIRs while handling full-width register reads/writes with sign extension.

  • The imm_to_u64 and imm_to_bytes functions in adapters/mod.rs properly handle 12-bit signed immediate sign-extension to 64 bits via the 24-bit encoding.

  • The new decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6 instead of RV32's 5-bit funct7), and correctly validates shamt < 32 for *W shift instructions.

  • The from_load and from_s_type transpiler utilities correctly encode load/store instructions with the 16-bit immediate format needed by the loadstore adapter.

  • Test coverage for the new chips is thorough — each *_W module has extensive tests covering edge cases (sign extension, overflow, boundary values).


Summary

The core RV64 architecture changes are solid — adapters, chips, and transpiler are well-structured. The main actionable items are:

  1. Fix config.rv32iconfig.rv64i in 5 locations (blocks compilation of tests and CUDA builds)
  2. Fix hint stream length encoding to use u64 instead of u32 (blocks correct zkVM execution of IO)
  3. Track CUDA port as a separate work item

@shuklaayush shuklaayush force-pushed the develop-v2.1.0-rv64 branch from bc66c89 to 7672360 Compare April 20, 2026 18:55
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Claude finished @shuklaayush's task in 4m 48s —— View job


Code Review: RV64 Support PR (Re-review after sync)

  • Gather context — check new commits since last review
  • Review fixes for previously reported bugs
  • Review decoder crate
  • Review toolchain changes (platform, openvm, transpiler, IO)
  • Review RV64 circuit/extension changes
  • Post final review

Overview

Re-reviewing after the latest fix: preserve rc.1 evolution + apply rv64-specific transformations commit. The PR is at 304 files changed, 15505 insertions, 6686 deletions across 14 commits.

Previous Bug #2 (hint stream length encoding) was fixed — now correctly uses u64::to_le_bytes() and 8-byte alignment. Previous Bug #4 (phantom sub-executor naming) was also fixed — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx. Previous Bug #7 (commented-out hintstore) was also fixed — hintstore module is now active.


Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda and aot features)

The Rv64ImConfig struct has field rv64i (line 103), but 5 locations still reference config.rv32i:

  • extensions/riscv/circuit/src/lib.rs:299 — GPU builder
  • extensions/riscv/circuit/src/mul/tests.rs:289
  • extensions/riscv/circuit/src/mulh/tests.rs:530
  • extensions/riscv/circuit/src/jalr/tests.rs:586
  • extensions/riscv/circuit/src/mul_w/tests.rs:419

Fix this →

2. AOT code uses RV32_REGISTER_NUM_LIMBS (4) instead of RV64_REGISTER_NUM_LIMBS (8) (Dead trait impls)

The AOT trait implementations in load_sign_extend/aot.rs are written for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (lines 21, 41), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:25. This means the AOT impls are for a type that is never instantiated. The same pattern may exist in other AOT files.

3. AOT tests read only 4 bytes from 8-byte registers

Multiple AOT test helpers use state.memory.read::<u8, 4>(RV32_REGISTER_AS, offset):

  • mul/tests.rs:316
  • mulh/tests.rs:601
  • jalr/tests.rs:618
  • mul_w/tests.rs:446

RV64 registers are 8 bytes wide. Reading only 4 bytes would silently truncate values, giving incorrect test results for any value that uses the upper 32 bits. Should be read::<u8, 8> for full-width ops (mul, mulh, jalr). For mul_w, reading 4 bytes might be intentional since *W ops produce 32-bit results sign-extended to 64 bits, but even then the sign-extended upper bytes should be verified.

4. CUDA extension not ported (Compilation error under cuda feature)

extensions/riscv/circuit/src/extension/cuda.rs still references old Rv32* types (Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc.) which no longer exist. All CUDA chip files (*/cuda.rs) also use RV32_REGISTER_NUM_LIMBS and RV32_CELL_BITS for core AIR parameterization — many of these need to be updated to RV64_* for non-W chips.

This is likely intentional for this development branch (marked "DON'T MERGE").

5. a_reg = a / 4 hardcodes 4-limb register stride in AOT code

load_sign_extend/aot.rs:115 uses a / 4 and b / 4 to convert byte offsets to register indices. With RV64's 8-byte registers, this should be a / 8 and b / 8. The gpr_to_rv32_register function in common/mod.rs:25 also hardcodes a 4-limb/2-limb-per-xmm scheme. The entire AOT layer appears to assume 32-bit registers.


Code Quality Issues

6. RV32_REGISTER_AS / RV32_MEMORY_AS in non-AOT code

While these constants happen to have the same numeric values as RV64_REGISTER_AS / RV64_MEMORY_AS (both 1 and 2), the following locations use the RV32 names in RV64 code, which is misleading:

  • divrem/execution.rs:117if d.as_canonical_u32() != RV32_REGISTER_AS
  • auipc/execution.rs:102if d as u32 != RV32_REGISTER_AS
  • branch_eq/execution.rs:124if d.as_canonical_u32() != RV32_REGISTER_AS
  • loadstore/aot.rs:133,162
  • shift_w/execution.rs:250,252,255
  • base_alu_w/execution.rs:251,253,256
  • mul_w/execution.rs:207,209,211
  • divrem_w/execution.rs:116,292,294,296

The *_W execution files' AOT code uses RV32_REGISTER_AS — functionally correct but inconsistent with the rest of the codebase that uses RV64_REGISTER_AS.

7. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ still uses hint_buffer_u32! and 4-byte word operations (RV32), while the outer uses hint_buffer_u64! and 8-byte dwords (RV64). The workspace correctly points to the outer one, but the inner directory is a confusing leftover.

8. ELF memory_image stores data in 4-byte chunks

The ELF loader uses BTreeMap<u32, u32> for the memory image and iterates in 4-byte steps (line 199 of elf.rs). This works correctly since RV64 8-byte values span two consecutive 4-byte entries, and elf_memory_image_to_openvm_memory_image properly converts to byte-level entries. However, a comment noting this design choice would help future readers.


Architecture Observations (Positive)

  • Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6) and W-suffix instruction validation (shamt < 32) are correctly implemented.

  • Phantom sub-executors properly updated for RV64 — read_rv64_register, 8-byte hint stream alignment, u64::to_le_bytes() for hint lengths.

  • Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.

  • Execution code for non-AOT paths (base_alu, shift, less_than, branch_eq, branch_lt) correctly uses vm_read::<u8, 8> and RV64_REGISTER_AS.

  • *_W adapter/core pattern is well-designed — adapters handle full 8-byte register reads/writes while cores operate on the lower 4-byte word with proper sign extension.

  • Test coverage for new RV64 transpiler tests is good — hand-written assembly programs (rv64im-stress.S, rv64im-intrin.S) cover arithmetic, shifts, branches, loads/stores, and M-extension operations.


Summary

Priority Issue Impact
High #1: config.rv32iconfig.rv64i Blocks compilation under cuda/aot features
High #2-5: AOT layer still assumes 32-bit registers Entire AOT code path is broken for RV64
Medium #4: CUDA extension not ported Blocks GPU proving (likely tracked separately)
Low #6: RV32_*_AS naming in execution code Cosmetic, functionally correct
Low #7: Inner guest/guest/ directory Cleanup needed, no runtime impact

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The main execution, circuit, and transpiler changes are well-structured. The primary risk is the AOT layer, which appears to still assume 4-limb (32-bit) registers throughout.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Claude finished @shuklaayush's task in 4m 47s —— View job


Code Review: RV64 Support PR (Re-review after sync)

  • Gather context — check new commits since last review
  • Review previously reported bugs — verify status
  • Review new decoder crate
  • Review toolchain changes (platform, openvm, transpiler, IO)
  • Review RV64 circuit/extension changes
  • Review test changes and CI workflows
  • Post final review

Overview

Re-reviewing after the latest sync. The PR is at 313 files changed, 15544 insertions, 6823 deletions across 15 commits. The new commit chore(ci): comment out aot workflows acknowledges the AOT layer is not yet ported.

Previously fixed issues:


Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

Rv64ImConfig has field rv64i (line 103 of lib.rs), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 299
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 419

All are behind #[cfg(feature = "aot")] or #[cfg(feature = "cuda")], so they don't break default builds, but will fail with those features enabled.

Fix this →

2. AOT load_sign_extend impl is dead code — wrong const generic

load_sign_extend/aot.rs:20-21 implements AotExecutor for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. The AOT impls (lines 20 and 41) will never be dispatched for any RV64 LoadSignExtendExecutor.

Same pattern applies to the metered impl at line 40-41.

3. AOT register stride hardcodes /4 instead of /8

The entire AOT x86 code generation layer maps RISC-V register addresses to x86 XMM registers using a 4-limb stride (a / 4). With RV64's 8-limb registers, these produce wrong XMM indices:

  • load_sign_extend/aot.rs:115-116a / 4, b / 4
  • loadstore/aot.rs:150-151a / 4, b / 4
  • branch_lt/aot.rs:40-41a / 4, b / 4
  • auipc/execution.rs:106a / 4
  • branch_eq/execution.rs:131-132a / 4, b / 4
  • jal_lui/execution.rs:114a / 4
  • divrem_w/execution.rs:121-123a / 4, b / 4, c / 4
  • All *_W execution files (base_alu_w, shift_w, mul_w) use same pattern

The helper gpr_to_rv32_register (common/mod.rs:25-31) and xmm_to_gpr (common/mod.rs:61-85) also assume 2 registers per XMM (the RV32 4-limb layout). For RV64, a register like x5 (byte offset 40 = 5×8) would map to 40/4 = 10 → xmm5, instead of the correct 40/8 = 5 → xmm2.

This is a systemic issue — the AOT layer needs a fundamental redesign for 64-bit register values since XMM elements are 32-bit (pinsrd/pextrd). The CI workflow commenting (chore(ci): comment out aot workflows) suggests this is a known deferral.

4. AOT metered code uses RV32_REGISTER_AS in RV64 *_W chips

The metered path in *_W execution files calls update_adapter_heights_asm(config, RV32_REGISTER_AS):

  • divrem_w/execution.rs:292-296
  • base_alu_w/execution.rs:251-256
  • shift_w/execution.rs:250-255
  • mul_w/execution.rs:207-211

While RV32_REGISTER_AS == RV64_REGISTER_AS == 1 (functionally identical), update_adapter_heights_asm in common/mod.rs:107 also prints error messages containing "RV32_REGISTER_AS", which is misleading for RV64.

5. CUDA extension references old Rv32* types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all of which were renamed to Rv64* in this PR. Every GPU chip construction (lines 39-113) uses the wrong types. Also missing: new *_W chip registrations for GPU backend.

This is clearly a known deferral for this development branch.


Code Quality Issues

6. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ still uses:

  • cfg(target_os = "zkvm") guard (old) vs the outer's cfg(openvm_intrinsics) (new)
  • hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
  • RV32_ALU_OPCODE vs the outer's RV64_ALU_OPCODE

The workspace points to the outer directory. The inner one is a confusing leftover and should be removed.

7. RV32_CELL_BITS in auipc/execution.rs:100

The AUIPC AOT code computes rd = pc.wrapping_add((c as u32) << RV32_CELL_BITS). Since RV32_CELL_BITS == RV64_CELL_BITS == 8, this is functionally correct but the naming is misleading in an RV64 codebase.


Architecture Observations (Positive)

  • Decoder crate is well-structured with correct RV64 shift amount handling (6-bit shamt via funct6 for 64-bit shifts, 5-bit for W-suffix with shamt < 32 validation)
  • Phantom sub-executors properly updated: read_rv64_register, 8-byte hint stream alignment, u64::to_le_bytes() for hint lengths
  • Toolchain migration is thorough: serializer/deserializer use WORD_SIZE=8, platform memory layout uses 8-byte alignment, read_u64() and read_vec_by_len() correctly handle dword boundaries
  • *_W adapter/core pattern is well-designed: adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension of 32-bit results back to 64-bit registers
  • Non-AOT execution paths (base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore) correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>
  • Test coverage for new W-chips is thorough with proper sign extension edge cases and negative tests for constraint violations

Summary

Priority Issue Impact Status
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot features Open
High #2-3: AOT layer assumes 32-bit registers Entire AOT code path broken for RV64 Known deferral (CI disabled)
Medium #5: CUDA extension not ported GPU proving broken Known deferral
Low #4: RV32_REGISTER_AS in *_W metered paths Functionally correct, misleading naming Open
Low #6: Inner guest/guest/ directory Confusing leftover, no runtime impact Open

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The *_W chip design with split-width adapters is clean. The remaining issues are confined to the AOT/CUDA layers, which are acknowledged as not yet ported.

Recommended before merge:

  1. Fix config.rv32iconfig.rv64i in the 5 locations (trivial)
  2. Remove the inner guest/guest/ directory
  3. Track AOT and CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Claude finished @shuklaayush's task in 5m 46s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new/changed files in detail
  • Review CI workflow changes
  • Post final review

Overview

Re-reviewing after the latest commit fix(ci): update references from rv32 to riscv (407978a). The PR is at 316 files changed, 15556 insertions, 6834 deletions across 16 commits.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: now uses u64::to_le_bytes() and 8-byte alignment (extension/mod.rs:866-874)
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • CI workflow references — Fixed in latest commit: rv32imriscv in workflow path filters

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

Rv64ImConfig has field rv64i (line 103 of lib.rs), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 299
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 419

All are behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")], so they don't break default builds but will fail with those features.

Fix this →

2. AOT load_sign_extend impl is dead code — wrong const generic

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 40-41. These impls will never be dispatched for any RV64 type.

3. AOT layer assumes 32-bit registers throughout

The AOT x86 code generation layer is fundamentally broken for RV64:

  • Register stride: a / 4 hardcoded in load_sign_extend/aot.rs:115, loadstore/aot.rs:150, branch_lt/aot.rs:40, and all *_W execution files. With RV64's 8-byte registers, these produce wrong XMM register indices.
  • XMM helpers: gpr_to_rv32_register and xmm_to_gpr in common/mod.rs:25-85 assume 2 RISC-V registers per XMM (32-bit × 2 = 64-bit XMM), but RV64 registers are 64 bits each — need 1:1 mapping or use different SSE/AVX storage.
  • *_W execution AOT impls: base_alu_w/execution.rs, shift_w/execution.rs, mul_w/execution.rs, divrem_w/execution.rs — all reference RV32_REGISTER_AS, RV32_IMM_AS, RV32_REGISTER_NUM_LIMBS in their AOT trait implementations.
  • update_adapter_heights_asm in common/mod.rs:107-117 — error messages say "RV32_REGISTER_AS" and "RV32_MEMORY_AS".

The CI disabling (chore(ci): comment out aot workflows) acknowledges this. The entire AOT layer needs a redesign for 64-bit values since the x86 XMM register mapping is fundamentally 32-bit.

4. CUDA extension references old Rv32* types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. All GPU chip files (*/cuda.rs) also use RV32_REGISTER_NUM_LIMBS and RV32_CELL_BITS. Also missing: new *_W chip registrations for GPU backend.

This is clearly a known deferral for this development branch.


Code Quality Issues

5. Serializer Vec::with_capacity over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 u64 words (64 bytes). This was the same pattern with Vec<u32> in RV32 (4× over-allocation → now 8× over-allocation). Functionally correct but wastes memory.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / WORD_SIZE) or just Vec::new().

6. Hardcoded 8 instead of WORD_SIZE in io/mod.rs

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use WORD_SIZE constant for consistency, as done elsewhere in the codebase.

7. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

  • Uses hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
  • Uses cfg(target_os = "zkvm") (old) vs cfg(openvm_intrinsics) (new)
  • Uses RV32_ALU_OPCODE vs RV64_ALU_OPCODE

The workspace points to the outer directory. The inner one should be removed.

8. cargo doc disabled in CI

lints.yml comments out the cargo doc step, suggesting docs may not build cleanly. This should be tracked and re-enabled.


Architecture Observations (Positive)

  • Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32) are correctly implemented. ITypeShamt properly masks to 6 bits (0x3f).

  • ELF loader correctly validates Class::ELF64, reads instructions in 4-byte chunks (RISC-V instructions are always 32-bit), and constrains the address space to u32.

  • Adapter/core pattern is well maintained. The *_W adapters (Rv64BaseAluWAdapter, Rv64MultWAdapter) correctly read full 8-byte registers, expose lower 4 bytes to cores, and handle sign extension via XOR-based sign bit extraction.

  • Address space encoding is correct: rs2_as constrained to boolean maps directly to RV64_IMM_AS=0 / RV64_REGISTER_AS=1.

  • Sign extension is correctly implemented across all W-suffix chips using the (result as i32 as i64 as u64) chain.

  • Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.

  • Phantom sub-executors properly use read_rv64_register(), 8-byte hint stream alignment, and u64::to_le_bytes().

  • Test coverage for new W-chips is thorough with sign extension edge cases, negative tests for constraint violations, and comprehensive boundary value testing.

  • Non-AOT, non-CUDA execution paths (base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore) correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.


Summary

Priority Issue Impact Status
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #2-3: AOT layer assumes 32-bit registers Entire AOT code path broken Known deferral (CI disabled)
Medium #4: CUDA extension not ported GPU proving broken Known deferral
Low #5: Serializer over-allocates capacity Wasted memory Minor
Low #6: Hardcoded 8 vs WORD_SIZE Style inconsistency Minor
Low #7: Inner guest/guest/ directory Confusing leftover Cleanup
Low #8: cargo doc disabled Docs may not build Track

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The *_W chip design with split-width adapters is clean. The decoder crate properly handles all RV64 instruction encoding differences.

Recommended before merge:

  1. Fix config.rv32iconfig.rv64i in the 5 locations (trivial, blocks feature-gated compilation)
  2. Remove the inner guest/guest/ directory (confusing leftover)
  3. Track AOT and CUDA port as separate work items with issues

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Claude finished @shuklaayush's task in 4m 41s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new/changed files in detail
  • Review circuit and extension changes
  • Review toolchain changes
  • Post final review

Overview

Re-reviewing after the latest commit fix(v2.1-rv64): fix lints and clippy (2304665). The PR is at 316 files changed, 15518 insertions, 6757 deletions across 17 commits.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: now uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment (extension/mod.rs:866-873)
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed in latest commit: changed from_u32(1 << ...) to from_u64(1u64 << ...) (adapters/mod.rs:298)
  • Algebra extension register reads — Fixed: read_rv32_registerread_rv64_register (algebra/circuit/src/extension/modular.rs)
  • Fibonacci examples — Fixed: reveal_u32reveal_u64

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

Rv64ImConfig has field rv64i (line 101), but the GPU builder still references config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297

Additionally, test files behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")]:

File Line
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

Fix this →

2. AOT impls reference non-existent Rv32* executor types (Compilation error under aot feature)

Several AotExecutor and AotMeteredExecutor impls implement traits for types that were renamed to Rv64* but the AOT blocks still use the old names:

File Line Has Rv32... Should be Rv64...
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

These are dead code — the impls won't be found for the actual Rv64* types. They'll fail to compile under aot feature since the Rv32* structs don't exist.

Fix this →

3. AOT load_sign_extend impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 41. These impls will never be dispatched for any RV64 type.

4. AOT register stride hardcodes /4 instead of /8

The AOT x86 code generation layer maps RISC-V register addresses to XMM registers using a 4-limb stride. With RV64's 8-byte registers, all register index calculations produce wrong results:

  • jal_lui/execution.rs:109a / 4
  • auipc/execution.rs:106a / 4
  • jalr/execution.rs:114a / 4, b / 4
  • load_sign_extend/aot.rs:115-116a / 4, b / 4
  • loadstore/aot.rs:150-151a / 4, b / 4
  • branch_lt/aot.rs:40-41a / 4, b / 4
  • All *_W execution files (base_alu_w, shift_w, mul_w, divrem_w)

The helper gpr_to_rv32_register (common/mod.rs:25-31) and xmm_to_gpr (common/mod.rs:61-85) also assume the RV32 2-registers-per-XMM layout. The entire AOT layer needs a fundamental redesign for 64-bit register values. The CI disabling (chore(ci): comment out aot workflows) acknowledges this.

5. CUDA extension references old Rv32* types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. All GPU chip files (*/cuda.rs) also use RV32_REGISTER_NUM_LIMBS and RV32_CELL_BITS. Also missing: new *_W chip registrations. This is clearly a known deferral.

6. *_W AOT impls use RV32_REGISTER_NUM_LIMBS for generic param

The AOT AotExecutor and AotMeteredExecutor impls for the W-suffix chips use RV32_REGISTER_NUM_LIMBS (4) in their generic parameter:

File Lines
base_alu_w/execution.rs 166, 234
shift_w/execution.rs (same pattern)
mul_w/execution.rs (same pattern)
divrem_w/execution.rs 108, 273

These implement BaseAluExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> but the actual types use RV64_REGISTER_NUM_LIMBS. These impls will never be dispatched.


Code Quality Issues

7. Serializer Vec::with_capacity over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). This is an 8x over-allocation. Functionally correct but wasteful.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / 8) or just Vec::new().

8. Hardcoded 8 instead of WORD_SIZE in io/mod.rs

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use a named constant for consistency with the rest of the codebase.

9. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

  • Uses hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
  • Uses cfg(target_os = "zkvm") (old) vs cfg(openvm_intrinsics) (new)

The workspace points to the outer directory. The inner one should be removed.

10. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named before merge.

11. 202 remaining RV32_* constant references across 50 files

The extensions/riscv/circuit/src/ directory still has 202 occurrences of RV32_REGISTER_AS, RV32_MEMORY_AS, RV32_REGISTER_NUM_LIMBS, RV32_CELL_BITS, and RV32_IMM_AS spread across 50 files. Most are in AOT (*/aot.rs), CUDA (*/cuda.rs), and test code behind feature gates. While RV32_REGISTER_AS == RV64_REGISTER_AS == 1 and RV32_CELL_BITS == RV64_CELL_BITS == 8 (functionally identical), RV32_REGISTER_NUM_LIMBS (4) ≠ RV64_REGISTER_NUM_LIMBS (8) — so any reference to RV32_REGISTER_NUM_LIMBS in non-feature-gated code is a potential logic bug.

12. auipc/tests.rs:419 uses RV32_REGISTER_NUM_LIMBS (value bug)

let a = gen_pointer(&mut rng, RV32_REGISTER_NUM_LIMBS);

This generates pointers with stride 4 instead of 8, which produces incorrect register addresses for RV64 in the CUDA test path.


Architecture Observations (Positive)

  • Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32) are correctly implemented. ITypeShamt properly handles both 5-bit and 6-bit shift amounts.

  • ELF loader correctly validates Class::ELF64, uses try_into() for u64→u32 conversions with proper error handling, and constrains addresses to u32.

  • Non-AOT execution paths are solid — base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore, jalr, jal_lui, auipc all correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).to_le_bytes(), and constrain upper bytes through XOR-based sign bit extraction.

  • imm_to_bytes and imm_to_u64 (adapters/mod.rs:70,86) correctly handle 12-bit signed immediate sign extension to 64 bits via the 24-bit encoding.

  • abstract_compose (adapters/mod.rs:292) was correctly fixed to use from_u64(1u64 << ...) to avoid overflow for RV64 register widths.

  • Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.

  • Test coverage for the new chips is thorough — comprehensive sign extension edge cases, negative tests for constraint violations, boundary value testing.


Summary

Priority Issue Impact Status
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #2: AOT impls reference Rv32*Executor types (8 impls across 4 files) Dead code + compilation failure under aot Trivial rename
High #3-4,6: AOT layer assumes 32-bit registers Entire AOT code path broken Known deferral (CI disabled)
Medium #5: CUDA extension not ported GPU proving broken Known deferral
Low #7: Serializer 8x over-allocation Wasted memory Minor
Low #8: Hardcoded 8 vs named constant Style Minor
Low #9: Inner guest/guest/ directory Confusing leftover Cleanup
Low #10: TEMP[jpw] marker Cleanup Minor
Low #11-12: RV32_* naming in 50 files Misleading, few are value bugs Tracked

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The latest commit successfully fixes the abstract_compose overflow, updates algebra extension register reads, and applies proper clippy/lint fixes.

Recommended before merge:

  1. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  2. Fix Rv32*ExecutorRv64*Executor in AOT impls across 4 files (trivial, 8 occurrences)
  3. Remove the inner guest/guest/ directory
  4. Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
  5. Track AOT and CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Claude finished @shuklaayush's task in 5m 27s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new/changed files in detail
  • Review circuit and extension changes
  • Review toolchain and decoder changes
  • Post final review

Overview

Re-reviewing after the latest commit fix(v2.1-rv64): fix default range checker sizes (e12e402). The PR is at 318 files changed, 15533 insertions, 6872 deletions across 20 commits.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: now uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment (extension/mod.rs:866-873)
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: changed from_u32(1 << ...) to from_u64(1u64 << ...) (adapters/mod.rs:298)
  • CI workflow references — Fixed: rv32imriscv in workflow path filters
  • Range checker sizes — Fixed in latest commit: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS for carry bound formula

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

All are behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")], so they don't break default builds but will fail with those features enabled.

Fix this →

2. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot feature)

Several AotExecutor and AotMeteredExecutor impls are written for Rv32* struct names, but the actual struct types were renamed to Rv64*. These are dead code — the impls apply to types that don't exist:

File Lines Has Rv32... Should be Rv64...
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor<A> Rv64JalLuiExecutor<A>
auipc/execution.rs 84, 167 Rv32AuipcExecutor<A> Rv64AuipcExecutor<A>
jalr/execution.rs 97, 185 Rv32JalrExecutor<A> Rv64JalrExecutor<A>
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

3. AOT load_sign_extend impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 40-41. These impls will never be dispatched for any RV64 type. Unlike the *_W chips where RV32_REGISTER_NUM_LIMBS == RV64_WORD_NUM_LIMBS == 4, here the actual executor uses 8 limbs.

4. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but only imports MulWOpcode from openvm_riscv_transpiler (line 14). MulOpcode is not re-exported through crate::common::*. This will fail to compile under the aot feature.

5. CUDA extension references old Rv32* types (Compilation error under cuda)

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. Also missing: new *_W chip registrations for GPU backend. This is clearly a known deferral (CI disabled).


Code Quality Issues

6. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8× over-allocation.

7. Hardcoded 8 instead of named constant in io/mod.rs

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use a named constant (e.g., WORD_SIZE) for consistency with the rest of the codebase.

8. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, RV32_ALU_OPCODE, old cfg(target_os = "zkvm") guard). The workspace points to the outer directory. The inner one should be removed.

9. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named before merge.

10. RV32_* constant naming in *_W and execution AOT code

The *_W execution files (base_alu_w, shift_w, mul_w, divrem_w) use RV32_REGISTER_AS and RV32_IMM_AS in their AOT trait implementations. While these constants are numerically equal to their RV64_* counterparts (1 and 0 respectively), it's misleading. Similarly, base_alu_w/execution.rs:166,234 uses RV32_REGISTER_NUM_LIMBS (4) where RV64_WORD_NUM_LIMBS (also 4) would be the semantically correct constant.


Architecture Observations (Positive)

  • Decoder crate is well-structured. RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32) are correctly implemented. ITypeShamt properly handles both 5-bit and 6-bit shift amounts.

  • Range checker sizes (latest commit e12e402) are now correctly computed: [1 << RV64_CELL_BITS, 2 * RV64_REGISTER_NUM_LIMBS * (1 << RV64_CELL_BITS)] = [256, 4096]. The carry bound formula 2N × 2^B for N=8, B=8 is a standard conservative upper bound for N-limb multiplication intermediate values.

  • ELF loader correctly validates Class::ELF64, reads instructions in 4-byte chunks (RISC-V instructions are always 32-bit even for RV64), and constrains the address space to u32.

  • Non-AOT execution paths are solid — base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore, jalr, jal_lui, auipc all correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).to_le_bytes(), and constrain upper bytes through XOR-based sign bit extraction.

  • Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling. imm_to_u64 and imm_to_bytes properly handle 12-bit signed immediate sign-extension to 64 bits via the 24-bit encoding.

  • Phantom sub-executors properly use read_rv64_register(), 8-byte hint stream alignment, and u64::to_le_bytes().

  • Test coverage for new chips is thorough — comprehensive sign extension edge cases, negative tests for constraint violations, and boundary value testing. New transpiler tests with hand-written RV64 assembly (rv64im-stress.S, rv64im-intrin.S) cover all RV64IM instruction categories.


Summary

Priority Issue Impact Status
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #2: AOT impls reference Rv32*Executor types (8 impls across 4 files) Dead code + compilation failure under aot Trivial rename
High #3: load_sign_extend/aot.rs wrong const generic Dead AOT impl Part of AOT port
Medium #4: Missing MulOpcode import in mul_w AOT Compilation failure under aot Trivial fix
Medium #5: CUDA extension not ported GPU proving broken Known deferral
Low #6: Serializer 8× over-allocation Wasted memory Minor
Low #7: Hardcoded 8 vs named constant Style Minor
Low #8: Inner guest/guest/ directory Confusing leftover Cleanup
Low #9: TEMP[jpw] marker Cleanup Minor
Low #10: RV32_* naming in AOT paths Misleading, functionally correct Tracked

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent. The latest commit correctly fixes the range checker sizes for RV64 multiplication. The *_W chip design with split-width adapters is clean.

Recommended before merge:

  1. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  2. Fix Rv32*ExecutorRv64*Executor in AOT impls across 4 files (trivial, 8 occurrences)
  3. Remove the inner guest/guest/ directory
  4. Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
  5. Track AOT and CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Claude finished @shuklaayush's task in 5m 30s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new/changed files in detail
  • Review circuit and extension changes
  • Review toolchain and decoder changes
  • Post final review

Overview

Re-reviewing after the latest commit feat(v2.1-rv64): update guest toolchain target to riscv64im-unknown-none-elf (2d0588c). The PR is at 319 files changed, 15544 insertions, 6882 deletions across 20 commits.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...) (adapters/mod.rs:298)
  • CI workflow references — Fixed: rv32imriscv in workflow path filters
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS

New in latest commit (2d0588c):

  • Guest toolchain target updated from riscv32im-risc0-zkvm-elfriscv64im-unknown-none-elf
  • build-std narrowed from alloc,core,proc_macro,panic_abort,std to alloc,core,panic_abort (std removed — consistent with no_std guest programs)
  • --cfg openvm_intrinsics added to rustc flags, replacing the old cfg(target_os = "zkvm") guard
  • cfg(not(any(target_arch = "riscv64"))) fixed to cfg(not(target_arch = "riscv64")) in guest Cargo.toml
  • std tests properly #[ignore]d with TODO(rv64-std) comments

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

All behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")] — won't break default builds but will fail with those features.

Fix this →

2. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot feature)

The actual struct types were renamed to Rv64*, but AOT impl blocks still name Rv32*:

File Lines Has Should be
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor<A> Rv64JalLuiExecutor<A>
auipc/execution.rs 84, 167 Rv32AuipcExecutor<A> Rv64AuipcExecutor<A>
jalr/execution.rs 97, 185 Rv32JalrExecutor<A> Rv64JalrExecutor<A>
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

These are dead code — the impls implement traits for types that don't exist. Will fail to compile under aot feature.

Fix this →

3. AOT load_sign_extend impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21 implements traits for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs), but the actual type alias Rv64LoadSignExtendExecutor uses RV64_REGISTER_NUM_LIMBS (8 limbs) — see load_sign_extend/mod.rs:24-25. Same for metered impl at line 41. These impls will never be dispatched.

4. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but only imports MulWOpcode from openvm_riscv_transpiler (line 14). MulOpcode is not imported anywhere in the file or re-exported through crate::common::*. This will fail to compile under the aot feature.

5. AOT layer assumes 32-bit registers throughout (systemic)

The AOT x86 code generation layer is fundamentally broken for RV64:

  • Register stride: a / 4 hardcoded in jal_lui/execution.rs:109, auipc/execution.rs:106, jalr/execution.rs:114, load_sign_extend/aot.rs:115, loadstore/aot.rs:150, branch_lt/aot.rs:40, and all *_W execution files. With RV64's 8-byte registers, these produce wrong XMM register indices.
  • XMM helpers: gpr_to_rv32_register and xmm_to_gpr in common/mod.rs:25-85 assume 2 RISC-V registers per XMM (32-bit × 2 = 64-bit XMM), but RV64 registers are 64 bits each.
  • *_W AOT impls use RV32_REGISTER_NUM_LIMBS (4) in const generics where the actual executor types use RV64_REGISTER_NUM_LIMBS (8) — making these impls dead code.

The CI disabling (chore(ci): comment out aot workflows) acknowledges this. The entire AOT layer needs a redesign for 64-bit register values.

6. CUDA extension references old Rv32* types (Compilation error under cuda)

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. Also missing: new *_W chip registrations for GPU backend. Clearly a known deferral.


New Observations on Latest Commit (2d0588c)

7. riscv64im-unknown-none-elf target requires nightly -Z build-std

The target riscv64im-unknown-none-elf is not a built-in Rust target — it requires -Z build-std (nightly-only). This is correctly configured (lines 268-273 of build/src/lib.rs) and the nightly toolchain is already required for guest compilation. The ISA string riscv64im correctly encodes base integer (I) + multiplication/division (M).

Note: The build-std removing std and proc_macro means guest programs can only use no_std. The #[ignore] annotations on test_rv64im_std and test_hashmap with TODO(rv64-std) comments correctly acknowledge this limitation.

8. openvm_intrinsics cfg flag is well-integrated

The migration from cfg(target_os = "zkvm") to cfg(openvm_intrinsics) is clean:

  • Set in encode_rust_flags (build/src/lib.rs:310-311)
  • check-cfg lint configured in both openvm/Cargo.toml and guest/Cargo.toml
  • Consistently used across openvm/src/lib.rs, io/mod.rs, io/read.rs, memory.rs
  • No stale cfg(target_os = "zkvm") references remain in the outer guest or toolchain crates

Code Quality Issues

9. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

  • Uses cfg(target_os = "zkvm") (old) vs the outer's cfg(openvm_intrinsics) (new)
  • Uses hint_buffer_u32! / RV32_ALU_OPCODE (RV32) vs the outer's hint_buffer_u64! / RV64_ALU_OPCODE (RV64)
  • Lacks the check-cfg lint configuration for openvm_intrinsics

The workspace points to the outer directory. The inner one should be removed to avoid confusion.

10. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. An 8× over-allocation. Functionally correct.

11. Hardcoded 8 instead of WORD_SIZE in io/mod.rs

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

WORD_SIZE is defined as 8 in platform/src/lib.rs:23. Should use the constant for consistency.

12. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker.


Architecture Observations (Positive)

  • Toolchain migration is thorough: The openvm_intrinsics cfg flag is a cleaner approach than target_os = "zkvm", decoupling intrinsic availability from the target triple. The riscv64im-unknown-none-elf target correctly uses -Z build-std without std.

  • Entry point (lib.rs:122-137) correctly uses ld sp, 0(sp) (load doubleword) and STACK_TOP: u64. The fault handler uses sd x0, 1(x0) (store doubleword to misaligned address).

  • Host hint stream (host.rs) properly uses u64::to_le_bytes() for length encoding, consistent with guest-side read_u64().

  • ELF loader correctly validates Class::ELF64, uses try_into() for all u64u32 address conversions with error handling, and keeps 4-byte instruction chunks (RISC-V instructions are always 32-bit even in RV64).

  • Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).

  • Non-AOT execution paths are solid — base_alu, shift, less_than, branch_eq, branch_lt, loadstore, hintstore, jalr, jal_lui, auipc all correctly use RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).to_le_bytes().

  • Test coverage is thorough — hand-written RV64 assembly (rv64im-stress.S, rv64im-intrin.S), comprehensive sign extension edge cases, negative tests for constraint violations.

  • build-std change correctly removes std and proc_macro, acknowledging no_std-only guest programs with properly #[ignore]d std tests and clear TODO comments.


Summary

Priority Issue Impact Status
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #2: AOT impls reference Rv32*Executor types (8 impls across 4 files) Dead code + compilation failure under aot Trivial rename
High #4: Missing MulOpcode import in mul_w AOT Compilation failure under aot Trivial fix
High #3,5: AOT layer assumes 32-bit registers (systemic) Entire AOT code path broken Known deferral (CI disabled)
Medium #6: CUDA extension not ported GPU proving broken Known deferral
Low #9: Inner guest/guest/ directory Confusing stale RV32 leftover Cleanup
Low #10: Serializer 8× over-allocation Wasted memory Minor
Low #11: Hardcoded 8 vs WORD_SIZE Style inconsistency Minor
Low #12: TEMP[jpw] marker Cleanup Minor

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The latest commit cleanly migrates the guest toolchain to riscv64im-unknown-none-elf with proper no_std build configuration and the new openvm_intrinsics cfg flag. The execution, circuit, transpiler, and toolchain changes are well-structured and consistent.

Recommended before merge:

  1. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  2. Fix Rv32*ExecutorRv64*Executor in AOT impls across 4 files (trivial, 8 occurrences)
  3. Add missing MulOpcode import in mul_w/execution.rs
  4. Remove the inner guest/guest/ directory
  5. Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
  6. Track AOT and CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

Claude finished @876pol's task in 6m 1s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new SHA2 changes
  • Review circuit and extension changes (deferral, algebra, bigint)
  • Review toolchain and decoder changes
  • Post final review

Overview

Re-reviewing after the latest commit feat: rv64 sha2 (#2721) (42b8678). The PR is at 329 files changed, 15747 insertions, 7024 deletions across 20 commits.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

NEW: Critical Bugs Found

1. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns, trace records, and range checks all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but RV64 registers are 8 bytes wide. This means the memory bridge reads/writes only 4 bytes from each register, which doesn't match the actual 8-byte register width.

AIR columns (output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276) read local.rd_val (4 elements) from the register address space, but an RV64 register occupies 8 bytes. The AIR constraint system is inconsistent with the actual register width.

Address range checks (output/air.rs:237-244, output/trace.rs:337-346, call/air.rs:306-313, call/trace.rs:177-180, 407-412) use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) for the address bound, but RV64 heap pointers can be up to 64 bits. bytes_to_f(&local.rd_val) composes only 4 bytes into a pointer, truncating any address above 2^32.

Impact: The memory bridge interactions won't match what the system memory chip expects for 8-byte register reads. The AIR constraints are unsound — they allow reading only half a register.

Files affected: output/air.rs, output/trace.rs, call/air.rs, call/trace.rs (all under extensions/deferral/circuit/src/)

2. Deferral transpiler uses wrong register stride (Address bug)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register addresses are computed as register_index × limbs_per_register. With RV64, each register occupies 8 bytes, so the stride should be RV64_REGISTER_NUM_LIMBS (8). Using 4 causes register addresses to overlap (e.g., register x2 at offset 8 would be addressed as offset 16, but with stride 4 it's addressed as offset 8 which overlaps with x1).

3. Algebra extension uses 4-byte register writes and wrong transpiler stride

Circuit (algebra/circuit/src/modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register, leaving upper bytes unwritten.

Transpiler (algebra/transpiler/src/lib.rs:102-103, 126, 204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),

Same wrong register address stride as the deferral transpiler.

Tests (algebra/circuit/src/fp2_chip/tests.rs:191-209, modular_chip/tests.rs:215-216):
Use RV32_REGISTER_NUM_LIMBS for pointer generation and register writes — all produce incorrect 4-byte-stride addresses.

4. Bigint transpiler uses wrong register stride

extensions/bigint/transpiler/src/lib.rs:146-147:

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

Previously Reported Bugs (Still Open)

5. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

Fix this →

6. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot)

8 impls across 4 files use old Rv32* struct names:

File Lines Has Should be
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

7. AOT load_sign_extend impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21, 40-41 implements for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs) but the actual alias uses RV64_REGISTER_NUM_LIMBS (8).

8. AOT layer assumes 32-bit registers throughout (systemic)

Register stride /4, XMM helpers, *_W AOT impls all use 32-bit assumptions. CI is disabled (chore(ci): comment out aot workflows) — known deferral.

9. CUDA extension references old Rv32* types

extension/cuda.rs:11-18 imports non-existent Rv32* types. Known deferral.


New SHA2 Observations

The SHA2 RV64 migration is well-done overall. Constants correctly updated to RV64_*:

  • RV64_REGISTER_AS, RV64_MEMORY_AS, RV64_REGISTER_NUM_LIMBS, RV64_CELL_BITS all correct
  • Pointer handling properly constrains upper 4 bytes (limbs 4-7) to zero for 32-bit addressing
  • SHA2_READ_SIZE == SHA2_WRITE_SIZE == 8 correctly handles 8-byte memory operations

Minor semantic issue in sha2_chips/main_chip/air.rs:345,350,355: The write loop uses STATE_READS and SHA2_READ_SIZE instead of STATE_WRITES and SHA2_WRITE_SIZE. Currently harmless because both pairs are equal (STATE_BYTES / 8), but semantically incorrect and would break if read/write sizes ever diverged.


Code Quality Issues

10. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

11. Serializer Vec::with_capacity 8× over-allocation

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes but with_capacity takes element count. Should be / 8.

12. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare the same package name. Inner directory uses stale RV32 code (hint_buffer_u32!, old cfg guards).

13. Deferral config field naming

extensions/deferral/circuit/src/extension/mod.rs:228-232 — Config fields named rv32i, rv32m but types are Rv64I, Rv64M, Rv64Io. GPU builder at extension/cuda.rs:136,141,146 references config.rv32i, config.rv32m — functionally correct but misleading.


Architecture Observations (Positive)

  • SHA2 RV64 port is clean — pointer constraints properly zero-extend upper limbs, transpiler correctly uses Rv64Sha2Opcode, and execution reads 8-byte registers.
  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation.
  • Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.
  • *_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).
  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.

Summary

Priority Issue Impact
Critical #1: Deferral AIR/trace uses 4-byte register arrays Soundness: memory bridge mismatches, AIR constraints wrong
Critical #2: Deferral transpiler wrong register stride Wrong register addresses in transpiled instructions
Critical #3: Algebra circuit 4-byte register writes + wrong transpiler stride Partial register writes + wrong addresses
Critical #4: Bigint transpiler wrong register stride Wrong register addresses
High #5: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot
High #6: AOT Rv32*ExecutorRv64*Executor (4 files) Dead code + compilation failure
Medium #7-9: AOT/CUDA systemic issues Known deferral (CI disabled)
Low #10-13: Code quality Cleanup items

The core riscv extension (non-AOT, non-CUDA) is solid, but the deferral, algebra, and bigint extensions have critical RV32_REGISTER_NUM_LIMBSRV64_REGISTER_NUM_LIMBS migration gaps that affect both AIR soundness and transpiler correctness.

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, range checks, and transpiler (affects ~50 locations across 6 files)
  2. Fix algebra extension — update is_eq.rs register write size, transpiler stride, and test constants
  3. Fix bigint transpiler — update register stride
  4. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  5. Fix Rv32*ExecutorRv64*Executor in AOT impls (trivial)
  6. Remove inner guest/guest/ directory and TEMP[jpw] marker
  7. Track AOT and CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @876pol's task in 6m 31s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new riscv-adapters changes (latest commit)
  • Review SHA2 and toolchain changes
  • Review deferral, algebra, bigint extension changes
  • Post final review

Overview

Re-reviewing after the latest commit feat: rv64 riscv-adapters (#2722) (6761341). The PR is at 21 commits, 100 changed files, +15988/−7102 lines.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

New in latest commit (6761341):

  • rv32-adapters renamed to riscv-adapters with full RV64 port
  • New helpers.rs with pad_reg_val, compose_ptr, tracing_read_reg_ptr utilities
  • All adapters (vec_heap, vec_heap_branch, eq_mod, vec_to_flat) properly use RV64_REGISTER_NUM_LIMBS and RV64_WORD_NUM_LIMBS
  • Updated test utilities with rv64_write_heap_default, generate_rv64_is_type_immediate

New: riscv-adapters Changes (Latest Commit) — Well Done

The riscv-adapters port is solid. Key design decisions are correct:

  • pad_reg_val (helpers.rs:12-21): Takes [Var; RV64_WORD_NUM_LIMBS] (4 bytes), returns [Expr; RV64_REGISTER_NUM_LIMBS] (8 bytes) with zero-padded upper bytes — enforces 32-bit pointer constraint at proving time via the memory bus interaction.

  • compose_ptr (helpers.rs:28-36): Correctly composes 4 materialized bytes into a field element for use as a memory address.

  • tracing_read_reg_ptr (helpers.rs:42-55): Reads full 8-byte register, debug-asserts upper 4 bytes are zero, returns lower 4 bytes as u32 — correct.

  • Range checks in all adapters correctly use RV64_CELL_BITS * RV64_WORD_NUM_LIMBS (32 bits) for pointer bound constraints.

  • Rv64IsEqualModAdapterCols (eq_mod.rs:64): Correctly uses MemoryWriteAuxCols<T, RV64_REGISTER_NUM_LIMBS> for the full 8-byte register write, while rs_val is [T; RV64_WORD_NUM_LIMBS] (4 limbs) — this is the right split.

  • Test utilities properly use RV64_REGISTER_NUM_LIMBS for write_heap_default and write_heap_pointer_default calls.

Zero RV32_* references in the entire riscv-adapters directory.


Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

Fix this →

2. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot)

8 impl blocks across 4 files use Rv32* struct names that were renamed to Rv64*:

File Lines Has Should be
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

3. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but DEFAULT_BLOCK_SIZE = 8 — meaning the memory bus expects 8-byte register interactions.

AIR columns (output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276) read local.rd_val (4 elements) from the register address space. The memory system expects 8-byte register reads (DEFAULT_BLOCK_SIZE = 8). This makes the AIR constraints unsound.

Address range checks (output/air.rs:237-244, call/air.rs:306-313) use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) — this happens to be correct since pointers are 32-bit, but should use RV64_* constants for clarity.

Impact: Memory bus interaction width mismatch will cause proof generation failure or unsound constraints.

4. Deferral transpiler uses wrong register stride

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be at address 20 instead of the correct 40 for RV64.

5. Algebra circuit + transpiler: wrong register width and stride

Circuit (algebra/circuit/src/modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register.

Transpiler (algebra/transpiler/src/lib.rs:102-103, 126, 204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8

Modular chip (algebra/circuit/src/modular_chip/mod.rs:42, 54, 69): Core AIR types parameterized with RV32_REGISTER_NUM_LIMBS (4) instead of RV64_REGISTER_NUM_LIMBS (8).

6. Bigint transpiler uses wrong register stride

extensions/bigint/transpiler/src/lib.rs:146-147:

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

7. AOT layer assumes 32-bit registers throughout (systemic)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers, and RV32_REGISTER_NUM_LIMBS const generics. This is a known deferral — CI is disabled via chore(ci): comment out aot workflows.

8. CUDA extension references old Rv32* types

extension/cuda.rs:11-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, etc. — all renamed. Also SHA2 CUDA files (sha2_main.cu:56-74, columns.cuh:27-29) still use RV32_REGISTER_NUM_LIMBS. Known deferral.


Code Quality Issues

9. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

10. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.

11. Serializer Vec::with_capacity over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes element count. Over-allocates by 8×.

12. Hardcoded 8 in io/mod.rs instead of named constant

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use WORD_SIZE constant.

13. Keccak256 extension not yet ported

keccak256/circuit/src/keccakf_op/columns.rs:22 and xorin/columns.rs:25-29 use RV32_REGISTER_NUM_LIMBS (4) for pointer limb arrays. Not part of this PR but will need porting before keccak256 works with RV64.


Architecture Observations (Positive)

  • riscv-adapters port is clean and well-structured. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers provide a clean abstraction for the 4-byte materialized pointer + 8-byte register value pattern.

  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).

  • Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.

  • Transpiler utility functions (from_r_type, from_i_type, etc.) correctly use RV64_REGISTER_NUM_LIMBS for register address scaling.

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.

  • ELF loader correctly validates Class::ELF64 and constrains addresses to u32.


Summary

Priority Issue Impact Status
Critical #3: Deferral AIR uses 4-byte register arrays Memory bus mismatch, unsound constraints Open
Critical #4: Deferral transpiler wrong register stride Wrong register addresses Open
Critical #5: Algebra circuit 4-byte writes + wrong transpiler stride Partial register writes + wrong addresses Open
Critical #6: Bigint transpiler wrong register stride Wrong register addresses Open
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #2: AOT Rv32*ExecutorRv64*Executor (4 files) Dead code + compilation failure under aot Trivial rename
Medium #7-8: AOT/CUDA systemic issues Known deferral (CI disabled) Tracked
Low #9-12: Code quality Cleanup items Open

The core riscv extension (non-AOT, non-CUDA) and the new riscv-adapters are solid. The adapter port in the latest commit is well-designed with correct constant usage and a clean helper API.

The deferral, algebra, and bigint extensions still have critical RV32_REGISTER_NUM_LIMBSRV64_REGISTER_NUM_LIMBS migration gaps that affect both AIR soundness and transpiler correctness.

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, range checks, and transpiler
  2. Fix algebra extension — update is_eq.rs register write size, modular_chip/mod.rs type parameters, transpiler stride, and test constants
  3. Fix bigint transpiler — update register stride
  4. Fix config.rv32iconfig.rv64i in 5 locations
  5. Fix Rv32*ExecutorRv64*Executor in AOT impls
  6. Remove inner guest/guest/ directory and TEMP[jpw] marker
  7. Track AOT, CUDA, and keccak256 port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @876pol's task in 6m 6s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new bigint changes (latest commit)
  • Review riscv-adapters changes
  • Review deferral and algebra extension changes in detail
  • Post final review

Overview

Re-reviewing after the latest commits feat: rv64 bigint (#2732) (4c64459) and feat: rv64 riscv-adapters (#2722) (6761341). The PR is at 22 commits, 100 changed files, +16518/−7535 lines.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

New in latest commits:

  • rv32-adapters renamed to riscv-adapters with full RV64 port — clean migration with new pad_reg_val, compose_ptr, tracing_read_reg_ptr helpers
  • Bigint extension updated to use RV64 types and adapters — transpiler correctly uses RV64_REGISTER_NUM_LIMBS

NEW: riscv-adapters Changes — Well Done

The riscv-adapters port is solid. Zero RV32_* references remain. Key design decisions are correct:

  • pad_reg_val (helpers.rs:12): Pads 4 materialized bytes to 8-byte register value with zero upper bytes — correct for 32-bit pointer constraint
  • compose_ptr (helpers.rs:27): Correctly composes 4 bytes into a 32-bit field element pointer
  • tracing_read_reg_ptr (helpers.rs:43): Reads full 8-byte register, debug-asserts upper 4 bytes are zero — correct
  • All adapters use RV64_REGISTER_NUM_LIMBS for register reads/writes via the memory bridge
  • Range checks correctly use RV64_CELL_BITS * RV64_WORD_NUM_LIMBS (32 bits) for pointer bounds

NEW: Bigint Extension — Mostly Good, AOT Dead Code

The bigint transpiler is correctly updated — uses RV64_REGISTER_NUM_LIMBS for register stride and Rv64*256Opcode types (bigint/transpiler/src/lib.rs:4,146-147).

The bigint circuit core types are correctly updated — Rv64BaseAlu256Executor, Rv64LessThan256Executor, etc. all use RV64_CELL_BITS and Rv64VecHeapAdapter* (bigint/circuit/src/lib.rs:108-179).

Bigint Bug: AOT impls reference non-existent Rv32*256Executor types

12 AOT trait impls across 6 files implement for Rv32* struct names, but the actual types were renamed to Rv64*:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

These are dead code — the impls implement traits for types that don't exist. Will fail to compile under aot feature.

Fix this →

Bigint CUDA/Test: RV32_CELL_BITS and Rv32*256Opcode references

  • bigint/circuit/src/cuda/mod.rs:15: Imports RV32_CELL_BITS (should be RV64_CELL_BITS; same value but wrong name — used in 20+ locations in the file)
  • bigint/circuit/src/tests.rs:655,716,782,847,905,971: Uses Rv32*256Opcode::CLASS_OFFSET — should be Rv64*256Opcode
  • bigint/circuit/src/tests.rs:629,690,753,819,879,945: Uses RV32_CELL_BITS — should be RV64_CELL_BITS
  • bigint/circuit/src/tests.rs:922,988: Uses Rv32VecHeapBranchAdapterExecutor — should be Rv64VecHeapBranchAdapterExecutor
  • bigint/circuit/src/extension/cuda.rs:55-101: References Rv32*256Air types — should be Rv64*256Air

Bugs (Still Open from Previous Reviews)

1. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

Rv64ImConfig has field rv64i (line 101 of lib.rs), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

Fix this →

2. RISCV AOT impls reference non-existent Rv32*Executor types (Compilation error under aot)

8 impl blocks across 4 files:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

3. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns, trace records, and memory bridge interactions all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays. With DEFAULT_BLOCK_SIZE = 8, the memory bus expects 8-byte register interactions. The memory_bridge.read() call passes local.rd_val (4 elements), creating a 4-byte read interaction that doesn't match the system's 8-byte register width.

AIR columns (deferral/circuit/src/output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276): Pass local.rd_val (4 elements) to memory_bridge.read(), creating a 4-byte memory interaction.

Address range checks (output/air.rs:237-244): Use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) and local.rd_val[RV32_REGISTER_NUM_LIMBS - 1] — functionally correct for 32-bit pointer bounds but semantically wrong.

bytes_to_f (utils.rs:64-68): Uses RV32_CELL_BITS for composition shift — functionally correct (same value) but should reference RV64_CELL_BITS.

Impact: The memory bridge interactions create 4-byte reads for register values, but the system memory chip expects 8-byte register reads (matching DEFAULT_BLOCK_SIZE = 8). This will cause a memory bus argument mismatch during proof generation.

Note: The pointer values are correctly 32-bit (F_NUM_BYTES = 4), so the pointer arithmetic is fine. The issue is that the register read width must be 8 bytes for RV64, with only the lower 4 bytes used as the pointer value and upper 4 bytes constrained to zero.

4. Deferral transpiler uses wrong register stride

deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64.

5. Algebra circuit + transpiler: wrong register width and stride

Execution code (algebra/circuit/src/execution.rs:395,408):

u32::from_le_bytes(exec_state.vm_read(RV32_REGISTER_AS, addr as u32))

Reads 4 bytes from an 8-byte register — truncates register value. Since registers hold pointers (which are 32-bit), this produces correct values if upper bytes are zero, but the read width doesn't match the register width.

IsEq execution (algebra/circuit/src/modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register.

Transpiler (algebra/transpiler/src/lib.rs:7,102-103,126,204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8

6. Deferral config/builder naming

deferral/circuit/src/extension/mod.rs:224-241: Config struct is Rv32DeferralConfig with fields rv32i: Rv64I, rv32m: Rv64M. Builder is Rv32DeferralCpuBuilder. These should be renamed to Rv64Deferral* or at least generic names.

7. AOT layer assumes 32-bit registers throughout (systemic — known deferral)

The AOT x86 code generation layer is fundamentally broken for RV64: register stride /4, XMM helpers, *_W AOT impls all use 32-bit assumptions. CI is disabled (chore(ci): comment out aot workflows).

8. CUDA extension references old Rv32* types (known deferral)

Both riscv/circuit/src/extension/cuda.rs and bigint/circuit/src/extension/cuda.rs reference old Rv32* types.


Code Quality Issues

9. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

10. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.

11. Serializer Vec::with_capacity over-allocates

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements — 8× over-allocation.


Architecture Observations (Positive)

  • riscv-adapters port is clean and well-structured. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers provide a clean abstraction for the 4-byte pointer + 8-byte register value pattern.
  • Bigint transpiler correctly uses RV64_REGISTER_NUM_LIMBS for register stride.
  • Bigint circuit core types correctly use RV64_CELL_BITS and Rv64VecHeapAdapter*.
  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation.
  • Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.
  • *_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).
  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.

Summary

Priority Issue Impact Status
Critical #3: Deferral AIR uses 4-byte register arrays Memory bus mismatch, unsound constraints Open
Critical #4: Deferral transpiler wrong register stride Wrong register addresses Open
Critical #5: Algebra circuit 4-byte writes + wrong transpiler stride Partial register writes + wrong addresses Open
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #2: RISCV AOT Rv32*ExecutorRv64*Executor (4 files) Dead code + compilation failure Trivial rename
High NEW: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files) Dead code + compilation failure Trivial rename
Medium #6: Deferral config naming Rv32Deferral* Misleading naming Cosmetic
Medium #7-8: AOT/CUDA systemic issues Known deferral (CI disabled) Tracked
Low #9-11: Code quality Cleanup items Open

The core riscv extension (non-AOT, non-CUDA) and the new riscv-adapters are solid. The bigint transpiler and core circuit types are correctly updated.

The deferral and algebra extensions still have critical RV32_REGISTER_NUM_LIMBSRV64_REGISTER_NUM_LIMBS migration gaps that affect both AIR soundness and transpiler correctness. These are the same issues from the previous review — they have not been addressed yet.

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, memory bridge interactions, and transpiler stride (~20 locations across 5 files)
  2. Fix algebra extension — update is_eq.rs register write size, execution.rs register read size, transpiler stride (~15 locations across 3 files)
  3. Fix config.rv32iconfig.rv64i in riscv circuit (5 locations)
  4. Fix Rv32*ExecutorRv64*Executor in riscv AOT impls (4 files, 8 occurrences)
  5. Fix Rv32*256ExecutorRv64*256Executor in bigint AOT impls (6 files, 12 occurrences)
  6. Remove inner guest/guest/ directory and TEMP[jpw] marker
  7. Track AOT, CUDA, and keccak256 port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @876pol's task in 7m 12s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes in detail (riscv-adapters, bigint, reduce memory columns)
  • Review deferral, algebra extension changes
  • Post final review

Overview

Re-reviewing after the latest commits. The PR is at 23 commits, 100 changed files, +16066/−7176 lines. Three new commits since the last review:

  1. 6761341 feat: rv64 riscv-adapters (#2722)
  2. 4c64459 feat: rv64 bigint (#2732)
  3. 7b24bc6 fix: rv64 reduce memory address columns (#2733)

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf
  • Bigint transpiler register stride — Fixed: now correctly uses RV64_REGISTER_NUM_LIMBS (confirmed zero RV32_* references in bigint transpiler)

New Commit Assessment

6761341 riscv-adapters — ✅ Well Done

The riscv-adapters port is solid. Zero RV32_* references in non-CUDA Rust code. Key design decisions are correct:

  • pad_reg_val (helpers.rs): Takes [Var; RV64_WORD_NUM_LIMBS] (4 bytes), returns [Expr; RV64_REGISTER_NUM_LIMBS] (8 bytes) with zero-padded upper bytes — correct.
  • compose_ptr and tracing_read_reg_ptr: Clean abstractions for the 4-byte pointer + 8-byte register pattern.
  • All adapters (vec_heap, vec_heap_branch, eq_mod, vec_to_flat) properly use RV64_REGISTER_NUM_LIMBS and RV64_WORD_NUM_LIMBS.
  • Test utilities properly migrated.

4c64459 rv64 bigint — ⚠️ Contains AOT/CUDA bugs (see below)

The non-AOT, non-CUDA parts are well-structured:

  • Common utilities in common.rs properly use RV64_* constants
  • INT256 operations correctly defined with u64 limbs
  • Transpiler correctly uses RV64_REGISTER_NUM_LIMBS throughout
  • Type aliases properly use Rv64VecHeapAdapterAir/Executor

But: AOT trait impls and CUDA code are broken — see bugs #7 and #8 below.

7b24bc6 reduce memory address columns — ✅ Sound optimization

The reduction from RV64_REGISTER_NUM_LIMBS (8) to RV64_WORD_NUM_LIMBS (4) for pointer columns is sound:

  • Memory bus enforcement: All files construct full 8-limb arrays with explicit zero-padding for memory bridge interactions (e.g., loadstore.rs:163-169)
  • Trace generation safety: rs1_val changed from u64 to u32 with an assertion that upper 32 bits are zero (loadstore.rs:386-390)
  • Removed zero-constraint loops are safe — the constraint is now enforced structurally at the memory bus level rather than via explicit AIR assertions
  • Hintstore and SHA2 follow the same pattern correctly
  • Deleted tests (test_hint_buffer_rem_words_upper_limbs_zero, test_hint_buffer_mem_ptr_upper_limbs_zero) are no longer necessary since constraints are implicit

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

Fix this →

2. AOT impls reference non-existent Rv32*Executor types in riscv circuit (Compilation error under aot)

8 impl blocks across 4 files use Rv32* struct names that were renamed to Rv64*:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

3. AOT load_sign_extend impl uses wrong const generic — dead code

load_sign_extend/aot.rs:20-21, 40-41 implements for LoadSignExtendExecutor<A, { RV32_REGISTER_NUM_LIMBS }, LIMB_BITS> (4 limbs) but the actual alias uses RV64_REGISTER_NUM_LIMBS (8). Dead code that will fail to compile under aot.

4. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported.

5. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

AIR columns and trace records in output/air.rs:59-60 and call/air.rs:230-231 use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but DEFAULT_BLOCK_SIZE = 8. The memory bridge reads/writes only 4 bytes from each register, creating an AIR constraint mismatch.

6. Deferral transpiler uses wrong register stride

deferral/transpiler/src/lib.rs:85-86, 95-96 computes register addresses as RV32_REGISTER_NUM_LIMBS * dec_insn.rd (stride 4 instead of 8). Register x5 would be at address 20 instead of the correct 40.

7. NEW: Bigint circuit AOT impls reference non-existent Rv32* types (Compilation error under aot)

12 impl blocks across 6 files implement traits for Rv32*256Executor types that don't exist (renamed to Rv64*256Executor):

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

8. NEW: Bigint CUDA tests reference non-existent types (Compilation error under cuda)

bigint/circuit/src/tests.rs (behind #[cfg(feature = "cuda")]):

  • Lines 655, 716, 782, 847, 905, 971: Reference Rv32*256Opcode::CLASS_OFFSET — should be Rv64*256Opcode::CLASS_OFFSET
  • Lines 922, 988: Reference Rv32VecHeapBranchAdapterExecutor — should be Rv64VecHeapBranchAdapterExecutor

Also, bigint/circuit/src/cuda/mod.rs imports from the old openvm_rv32_adapters package which no longer exists, and uses Rv32VecHeapAdapterCols/Rv32VecHeapAdapterRecord types.

9. Algebra circuit + transpiler: wrong register width and stride

  • algebra/circuit/src/modular_chip/is_eq.rs:726: [0u8; RV32_REGISTER_NUM_LIMBS] — writes only 4 bytes to an 8-byte register
  • algebra/transpiler/src/lib.rs:102-103, 126, 204-205: RV32_REGISTER_NUM_LIMBS * dec_insn.rd — wrong stride

10. CUDA extension references old Rv32* types (systemic)

riscv/circuit/src/extension/cuda.rs, bigint/circuit/src/extension/cuda.rs, deferral/circuit/src/extension/cuda.rs, and CUDA .cuh files all reference Rv32* types. This is a known deferral — CI is disabled for CUDA.

11. AOT layer assumes 32-bit registers throughout (systemic)

Register stride /4, XMM helpers, *_W AOT impls all use 32-bit assumptions. Known deferral — CI is disabled for AOT.


Code Quality Issues

12. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:302:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

13. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes but with_capacity takes element count. Should be / 8.

14. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.

15. Hardcoded 8 instead of WORD_SIZE in io/mod.rs:60-61

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use the WORD_SIZE constant for consistency.


Architecture Observations (Positive)

  • riscv-adapters port is clean and well-structured. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers provide a clean abstraction for the 4-byte pointer + 8-byte register pattern.
  • Memory address column optimization (commit 7b24bc6) is sound — zero-padding is enforced structurally at the memory bus level, not via explicit AIR assertions.
  • Bigint transpiler is properly ported — all register strides use RV64_REGISTER_NUM_LIMBS, opcode types use Rv64*.
  • Non-AOT, non-CUDA execution paths across all extensions are solid — consistent RV64_* constant usage.
  • *_W adapter/core pattern remains well-designed with proper sign extension.
  • Decoder crate correctly handles RV64 shift amounts and W-suffix validation.
  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment.

Summary

Priority Issue Impact Status
Critical #5: Deferral AIR uses 4-byte register arrays Soundness: memory bus mismatch Open
Critical #6: Deferral transpiler wrong register stride Wrong register addresses Open
Critical #9: Algebra circuit 4-byte writes + wrong transpiler stride Partial register writes + wrong addresses Open
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #2: Riscv AOT Rv32*ExecutorRv64*Executor (4 files) Dead code + compilation failure under aot Trivial rename
High #7: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files) Dead code + compilation failure under aot Trivial rename
High #8: Bigint CUDA tests reference Rv32* types Compilation failure under cuda Trivial rename
Medium #3-4: load_sign_extend AOT + missing MulOpcode import Compilation failure under aot Part of AOT port
Medium #10-11: CUDA + AOT systemic issues Known deferral (CI disabled) Tracked
Low #12: TEMP[jpw] marker Cleanup Minor
Low #13: Serializer 8× over-allocation Wasted memory Minor
Low #14: Inner guest/guest/ directory Confusing leftover Cleanup
Low #15: Hardcoded 8 vs WORD_SIZE Style Minor

The core RV64 architecture (non-AOT, non-CUDA) is solid. The latest riscv-adapters commit is well-structured, the bigint transpiler is correctly ported, and the memory column optimization is sound. The main remaining risks are:

  1. Deferral and algebra extensions still have critical RV32_REGISTER_NUM_LIMBSRV64_REGISTER_NUM_LIMBS migration gaps affecting AIR soundness and transpiler correctness
  2. AOT/CUDA paths are systematically broken (acknowledged via CI disable)

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, and transpiler (~20 locations)
  2. Fix algebra extension — update is_eq.rs register write size + transpiler stride
  3. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  4. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) impls
  5. Remove inner guest/guest/ directory and TEMP[jpw] marker
  6. Track AOT, CUDA, and keccak256 port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @Tuanlinh12312's task in 6m 55s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (keccak256, bigint, riscv-adapters, reduce memory)
  • Review deferral, algebra extension changes
  • Review toolchain and decoder changes
  • Post final review

Overview

Re-reviewing after the latest commits: feat: rv64 keccak256 (#2724), fix: rv64 reduce memory address columns (#2733), feat: rv64 bigint (#2732), and feat: rv64 riscv-adapters (#2722). The PR is at 24 commits, 100 changed files, +16688/−7702 lines.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: module is active
  • abstract_compose overflow — Fixed: from_u64(1u64 << ...)
  • Range checker sizes — Fixed: RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: riscv64im-unknown-none-elf
  • AOT executor naming (jal_lui, auipc, jalr, hintstore) — Fixed: all now Rv64*Executor
  • Bigint transpiler stride — Fixed: now uses RV64_REGISTER_NUM_LIMBS

Bugs (Still Open)

1. config.rv32i should be config.rv64i (Compilation error under cuda feature)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

All behind #[cfg(feature = "cuda")] or #[cfg(feature = "aot")] — blocks compilation with those features.

Fix this →

2. Deferral extension: register arrays and transpiler still use RV32 (Soundness bug)

The deferral extension has not been ported to RV64 — every register-related constant is still RV32_REGISTER_NUM_LIMBS (4 bytes instead of 8):

Register arrays (should be RV64_REGISTER_NUM_LIMBS = 8):

  • output/air.rs:59-60rd_val: [T; RV32_REGISTER_NUM_LIMBS], rs_val: [T; RV32_REGISTER_NUM_LIMBS]
  • output/trace.rs:72-73 — same pattern
  • call/air.rs:230-231 — same pattern
  • call/trace.rs:234-235 — same pattern

Range checks (wrong array index, wrong bit width):

  • output/air.rs:237-251 — accesses [RV32_REGISTER_NUM_LIMBS - 1] = index 3 (should be 7), multiplies by 4 (should be 8)
  • output/trace.rs:337-346 — same
  • call/air.rs:306-345 — same
  • call/trace.rs:177-180, 407-412 — same

Transpiler stride (should be ×8 not ×4):

  • transpiler/src/lib.rs:85-86, 95-96RV32_REGISTER_NUM_LIMBS * dec_insn.rd → register x5 maps to address 20 instead of correct 40

Impact: Memory bus interaction width mismatch with 8-byte registers. AIR constraints unsound. Transpiler generates wrong register addresses.

Fix this →

3. Algebra extension: wrong register write size, type parameters, and transpiler stride (Soundness bug)

Write size (modular_chip/is_eq.rs:726):

let mut write_data = [0u8; RV32_REGISTER_NUM_LIMBS];  // 4 bytes, should be 8

Writes only 4 bytes to an 8-byte register — partial register write.

Type parameters (modular_chip/mod.rs:42, 54, 69):
ModularIsEqualCoreAir parameterized with RV32_REGISTER_NUM_LIMBS (4) as WRITE_LIMBS — should be RV64_REGISTER_NUM_LIMBS (8). Creates structural mismatch between circuit expectations and executor.

Transpiler stride (transpiler/src/lib.rs:102-103, 126, 204-205):

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rd),   // stride 4, should be 8

Register x1 maps to address 4 instead of correct 8.

Fix this →

4. Bigint AOT impls reference non-existent Rv32* executor types (Compilation error under aot)

12 AotExecutor/AotMeteredExecutor impl blocks across 6 files use old Rv32* struct names:

File Lines Has Should be
base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor
less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor

Fix this →

5. Bigint test files reference non-existent Rv32VecHeapBranchAdapterExecutor

extensions/bigint/circuit/src/tests.rs:922, 988 — Uses Rv32VecHeapBranchAdapterExecutor but the type was renamed to Rv64VecHeapBranchAdapterExecutor in the riscv-adapters crate. These are CUDA tests and will fail to compile under cuda feature.

6. Riscv circuit AOT impls use wrong const generic RV32_REGISTER_NUM_LIMBS (Dead code)

AOT AotExecutor/AotMeteredExecutor impls for non-W base chips implement for { RV32_REGISTER_NUM_LIMBS } (4 limbs) but the actual executor types use RV64_REGISTER_NUM_LIMBS (8 limbs) — making these impls dead code:

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 21, 41
base_alu_w/execution.rs 166, 234

7. CUDA extension references old Rv32* types (known deferral)

extension/cuda.rs imports Rv32BaseAluAir, Rv32LoadStoreAir, etc. — all renamed to Rv64*. Bigint CUDA files (cuda/mod.rs, cuda/src/bigint.cu) extensively reference Rv32VecHeapAdapter* types. Keccak256 CUDA files use RV32_REGISTER_NUM_LIMBS for register arrays and width calculations. All CUDA code is broken for RV64 — acknowledged via chore(ci): comment out aot workflows.


Code Quality Issues

8. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s...

9. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes but with_capacity takes element count. Should be / 8.

10. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". Inner directory uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard). Should be removed.


Architecture Observations (Positive)

  • Keccak256 RV64 port (latest commit) is solid — Rust code correctly uses RV64_WORD_NUM_LIMBS for pointer columns, expand_to_rv64_register() for zero-padded 8-byte bus interactions, and rv64_register_to_u32() for pointer extraction. Well-designed.

  • Bigint RV64 port (latest commit) is correct in non-AOT paths — all execution code properly uses RV64_REGISTER_NUM_LIMBS, RV64_REGISTER_AS, RV64_MEMORY_AS. Transpiler correctly uses RV64_REGISTER_NUM_LIMBS for stride. Only AOT impls have stale naming.

  • riscv-adapters port is clean and well-structured — pad_reg_val, compose_ptr, tracing_read_reg_ptr helpers correctly handle the 4-byte materialized pointer + 8-byte register pattern. Zero RV32_* references in Rust code.

  • Reduce-memory-address-columns changes are correct — optimizes column usage by materializing only significant limbs (RV64_WORD_NUM_LIMBS = 4 for pointers) while zero-padding to full register width in bus interactions. Applied cleanly to hintstore, loadstore, and SHA2 chips.

  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).

  • Non-AOT, non-CUDA execution paths across the riscv extension remain solid — consistent RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.

  • *_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).


Summary

Priority Issue Impact Status
Critical #2: Deferral extension fully unported Unsound AIR constraints, wrong transpiler addresses Open
Critical #3: Algebra extension wrong register width + stride Partial register writes, wrong addresses Open
High #1: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #4: Bigint AOT Rv32*Rv64* (12 impls, 6 files) Dead code + compilation failure under aot Trivial rename
High #5: Bigint tests Rv32VecHeapBranch*Rv64* Compilation failure under cuda Trivial rename
High #6: Riscv AOT impls wrong const generic Dead code under aot Part of AOT port
Medium #7: CUDA extension not ported (all extensions) GPU proving broken Known deferral
Low #8: TEMP[jpw] marker Cleanup Minor
Low #9: Serializer 8× over-allocation Wasted memory Minor
Low #10: Inner guest/guest/ directory Confusing leftover Cleanup

The core riscv extension, riscv-adapters, bigint (non-AOT), and keccak256 (Rust) are solid. The latest commits demonstrate a correct and consistent RV64 migration pattern with proper pointer materialization (4-byte columns) and zero-padded register bus interactions (8 bytes).

The deferral and algebra extensions are the primary remaining risks — they have critical RV32_REGISTER_NUM_LIMBS usage that produces wrong register array sizes, wrong transpiler addresses, and unsound AIR constraints.

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS across 4 circuit files and transpiler (~30 occurrences)
  2. Fix algebra extension — update is_eq.rs write size, mod.rs type parameters, and transpiler stride
  3. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  4. Fix bigint AOT Rv32*Rv64* in 6 files (trivial rename)
  5. Remove inner guest/guest/ directory and TEMP[jpw] marker
  6. Track AOT, CUDA, and keccak256 CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @Tuanlinh12312's task in 7m 8s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (riscv-adapters, bigint, keccak256, reduce memory)
  • Review deferral, algebra extension changes
  • Review toolchain and decoder changes
  • Post final review

Overview

Re-reviewing after the latest 5 commits: feat: rv64 algebra (#2727), feat: rv64 keccak256 (#2724), fix: rv64 reduce memory address columns (#2733), feat: rv64 bigint (#2732), feat: rv64 riscv-adapters (#2722). The PR is at 25 commits, 366 files changed, +17031/−8160 lines.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf

New commits assessment:


Bugs (Still Open)

1. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension's AIR columns, trace records, and memory bridge interactions all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays, but RV64 registers are 8 bytes wide. The memory bridge reads only 4 bytes from each register, creating an unsound constraint system.

AIR columns (output/air.rs:59-60, call/air.rs:230-231):

pub rd_val: [T; RV32_REGISTER_NUM_LIMBS],  // 4 elements, should be 8
pub rs_val: [T; RV32_REGISTER_NUM_LIMBS],

Trace records (output/trace.rs:72-73, call/trace.rs:234-235):

pub rd_val: [u8; RV32_REGISTER_NUM_LIMBS],  // 4 bytes, should be 8
pub rs_val: [u8; RV32_REGISTER_NUM_LIMBS],

Memory bridge reads (output/air.rs:260-276) — passes local.rd_val (4 elements) to memory_bridge.read(). In the RV64 system, register reads should be 8 bytes. The memory interaction size doesn't match what the system memory chip expects.

Address range checks (output/air.rs:237-244, call/air.rs:306-313) use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32 bits) for the address bound. For the upper-limb check, local.rd_val[RV32_REGISTER_NUM_LIMBS - 1] accesses index 3 of a 4-element array — but with 8-byte registers, the upper bound check should access index 7.

Impact: Memory bus interaction width mismatch → unsound AIR constraints.

Files affected: extensions/deferral/circuit/src/output/air.rs, output/trace.rs, call/air.rs, call/trace.rs

Fix this →

2. Deferral transpiler uses wrong register stride (Address bug)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 at offset 20 instead of the correct 40 for RV64.

Fix this →

3. Deferral config field naming mismatch

extensions/deferral/circuit/src/extension/mod.rs:228-230:

pub rv32i: Rv64I,   // Field named "rv32i" but type is Rv64I
pub rv32m: Rv64M,   // Field named "rv32m" but type is Rv64M

This is cosmetic (functionally works) but will cause confusion and creates a landmine for anyone adding feature-gated code that checks field names.

4. config.rv32i should be config.rv64i in riscv extension (5 locations)

Rv64ImConfig has field rv64i, but 5 locations still reference config.rv32i:

File Line Feature gate
extensions/riscv/circuit/src/lib.rs 297 cuda
extensions/riscv/circuit/src/mul/tests.rs 289 cuda
extensions/riscv/circuit/src/mulh/tests.rs 530 aot
extensions/riscv/circuit/src/jalr/tests.rs 586 aot
extensions/riscv/circuit/src/mul_w/tests.rs 416 aot

Fix this →

5. AOT impls reference non-existent Rv32*Executor types (riscv extension)

8 impl blocks across 4 files implement traits for struct names that no longer exist:

File Lines Has Should be
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Dead code under aot feature — will fail to compile when AOT is enabled.

Fix this →

6. AOT impls in riscv extension use wrong const generics — dead code

Non-W chips (base_alu, less_than, divrem) define Rv64*Executor with RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4):

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 20, 41

These impls implement traits for types that never get instantiated in RV64.

7. Bigint AOT impls reference non-existent Rv32* types

Same pattern as riscv extension — actual types renamed to Rv64* but AOT impls still use Rv32*:

File Lines Has Should be
base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor
branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor

8. Bigint CUDA test code references old types

extensions/bigint/circuit/src/tests.rs — behind #[cfg(feature = "cuda")]:

  • Lines 655, 716, 782, 847, 905, 971: Reference Rv32*Opcode::CLASS_OFFSET (renamed to Rv64*)
  • Lines 922, 988: Reference Rv32VecHeapBranchAdapterExecutor (renamed to Rv64*)

9. CUDA extensions not ported (systemic — known deferral)

All CUDA code across riscv, keccak256, sha2, bigint, and deferral extensions still references old Rv32* types and RV32_REGISTER_NUM_LIMBS. CI workflows are disabled for these. Tracked as separate work items.

10. AOT layer assumes 32-bit registers throughout (systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers, and RV32_REGISTER_NUM_LIMBS const generics. CI workflows are disabled. Needs fundamental redesign for 64-bit register values.


Code Quality Issues

11. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

12. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. Should be / 8 or use Vec::new().

13. Hardcoded 8 instead of WORD_SIZE in io/mod.rs

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

WORD_SIZE is defined as 8 in platform/src/lib.rs:23. Should use the constant.

14. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code:

  • Uses hint_buffer_u32! (RV32) vs the outer's hint_buffer_u64! (RV64)
  • Uses cfg(target_os = "zkvm") (old) vs cfg(openvm_intrinsics) (new)
  • Uses RV32_ALU_OPCODE vs RV64_ALU_OPCODE

15. Bigint pointer truncation lacks explicit safety assertions

In extensions/bigint/circuit/src/base_alu.rs (and other bigint execution files), pointer values are extracted as:

u64::from_le_bytes(rs1_ptr) as u32  // Silent truncation

Unlike the algebra extension which uses debug_assert!(val <= u32::MAX as u64), the bigint code silently truncates. While the VM invariants likely guarantee this is safe, adding explicit debug assertions would be consistent with the algebra extension's approach.


Architecture Observations (Positive)

  • riscv-adapters port is exemplary — zero RV32_* references, clean helper API (pad_reg_val, compose_ptr, tracing_read_reg_ptr), proper 4-byte pointer storage with 8-byte padding in constraints.

  • Algebra extension (feat: rv64 algebra #2727) is clean — all RV32_REGISTER_NUM_LIMBS properly migrated to RV64_REGISTER_NUM_LIMBS, explicit pointer validation with debug_assert!.

  • Keccak256 extension (feat: rv64 keccak256 #2724) is well-migrated — rv64_register_to_u32 helper correctly reads 8-byte registers and validates upper bytes are zero.

  • Reduce memory columns (fix: rv64 reduce memory address columns #2733) is a good optimization — stores only the necessary 4 pointer bytes in columns, pads to 8 in AIR constraints, saving trace width.

  • Non-AOT execution paths in the riscv extension are solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, and vm_read::<u8, 8>.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.

  • Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag is cleanly integrated.


Summary

Priority Issue Impact Status
Critical #1: Deferral AIR/trace uses 4-byte register arrays Memory bus mismatch, unsound constraints Open
Critical #2: Deferral transpiler wrong register stride Wrong register addresses Open
High #4: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #5: AOT Rv32*ExecutorRv64*Executor (riscv, 4 files) Dead code + compilation failure under aot Trivial rename
High #6: AOT impls wrong const generics (riscv, 4 files) Dead code under aot Part of AOT port
High #7: Bigint AOT Rv32*Rv64* (6 files) Dead code + compilation failure under aot Trivial rename
Medium #3: Deferral config rv32i/rv32m field naming Misleading, functionally correct Cosmetic fix
Medium #8: Bigint CUDA test old types Compilation failure under cuda Part of CUDA port
Medium #9-10: AOT/CUDA systemic Known deferral (CI disabled) Tracked
Low #11: TEMP[jpw] marker Cleanup Minor
Low #12: Serializer 8× over-allocation Wasted memory Minor
Low #13: Hardcoded 8 vs WORD_SIZE Style Minor
Low #14: Duplicate guest/guest/ directory Confusing leftover Cleanup
Low #15: Bigint silent pointer truncation Missing safety assertion Consistency

The core RV64 architecture (non-AOT, non-CUDA paths) is solid. The latest commits (algebra, keccak256, riscv-adapters, reduce memory columns) are well-structured with proper RV64 constant usage. The adapter pattern in riscv-adapters is particularly clean.

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, range checks, and transpiler (~30 locations across 5 files)
  2. Fix config.rv32iconfig.rv64i in 5 riscv locations (trivial)
  3. Fix Rv32*ExecutorRv64*Executor in AOT impls across riscv (4 files) and bigint (6 files) — trivial renames
  4. Remove inner guest/guest/ directory and TEMP[jpw] marker
  5. Track AOT, CUDA, and deferral CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @Tuanlinh12312's task in 6m 0s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (ECC, algebra, keccak256, bigint, riscv-adapters, reduce memory)
  • Review deferral and algebra extension changes in detail
  • Review toolchain and decoder changes
  • Post final review

Overview

Reviewing after the latest commits including feat: rv64 ECC (#2728), feat: rv64 algebra (#2727), feat: rv64 keccak256 (#2724). The PR is at 26 commits, 380 files changed, +17245/−8348 lines.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf
  • Algebra extension — Fixed: transpiler, execution, and circuit code all correctly use RV64_REGISTER_NUM_LIMBS, RV64_REGISTER_AS, and u64::from_le_bytes(). Zero remaining RV32_REGISTER_NUM_LIMBS or RV32_CELL_BITS references.
  • ECC extension — Clean: no RV32_ references found
  • Keccak256 non-CUDA code — Clean: correctly uses RV64_WORD_NUM_LIMBS

Bugs (Still Open)

1. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug — Critical)

The deferral extension's AIR columns, trace records, and memory bridge interactions all use RV32_REGISTER_NUM_LIMBS (4) for register value arrays. Since MemoryReadAuxCols<T> infers its block size from the data array passed to memory_bridge.read(), passing a 4-element array creates a 4-byte memory read — but RV64 registers are 8 bytes wide. This creates a memory bus interaction mismatch.

AIR columns — arrays sized at 4 instead of 8:

File Line Field
deferral/circuit/src/output/air.rs 59 rd_val: [T; RV32_REGISTER_NUM_LIMBS]
deferral/circuit/src/output/air.rs 60 rs_val: [T; RV32_REGISTER_NUM_LIMBS]
deferral/circuit/src/call/air.rs 230 rd_val: [T; RV32_REGISTER_NUM_LIMBS]
deferral/circuit/src/call/air.rs 231 rs_val: [T; RV32_REGISTER_NUM_LIMBS]

Trace records — arrays sized at 4 instead of 8:

File Line Field
deferral/circuit/src/output/trace.rs 72 rd_val: [u8; RV32_REGISTER_NUM_LIMBS]
deferral/circuit/src/output/trace.rs 73 rs_val: [u8; RV32_REGISTER_NUM_LIMBS]
deferral/circuit/src/call/trace.rs 234 rd_val: [u8; RV32_REGISTER_NUM_LIMBS]
deferral/circuit/src/call/trace.rs 235 rs_val: [u8; RV32_REGISTER_NUM_LIMBS]

Memory bridge readsoutput/air.rs:260-276 and call/air.rs:356-376 pass local.rd_val (4 elements) to memory_bridge.read(), creating 4-byte memory interactions for 8-byte registers.

Address range checksoutput/air.rs:237-244, output/trace.rs:337-346, call/air.rs:306-313, call/trace.rs:177-180,407-412 use RV32_CELL_BITS * RV32_REGISTER_NUM_LIMBS (= 32) and index [RV32_REGISTER_NUM_LIMBS - 1] (= limb[3] instead of limb[7]).

Note: The pointer values are correctly 32-bit. The approach used in riscv-adapters (pad_reg_val — read 8 bytes, use lower 4 as pointer, constrain upper 4 to zero) should be applied here. See extensions/riscv-adapters/src/helpers.rs for the correct pattern.

Fix this →

2. Deferral transpiler uses wrong register stride (Critical)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64.

Fix this →

3. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line
riscv/circuit/src/lib.rs 297
riscv/circuit/src/mul/tests.rs 289
riscv/circuit/src/mulh/tests.rs 530
riscv/circuit/src/jalr/tests.rs 586
riscv/circuit/src/mul_w/tests.rs 416

Fix this →

4. RISCV AOT impls reference non-existent Rv32*Executor types (Compilation error under aot)

8 impl blocks across 4 files use Rv32* struct names that were renamed to Rv64*:

File Lines Has Should be
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

5. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot)

12 impl blocks across 6 files:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

6. Bigint CUDA references old Rv32*Air types (Compilation error under cuda)

extensions/bigint/circuit/src/extension/cuda.rs:55-101 references Rv32BaseAlu256Air, Rv32LessThan256Air, Rv32BranchEqual256Air, Rv32BranchLessThan256Air, Rv32Multiplication256Air, Rv32Shift256Air — all renamed to Rv64*.

7. RISCV CUDA extension references old Rv32* types (Compilation error under cuda)

extensions/riscv/circuit/src/extension/cuda.rs:12-18 imports Rv32BaseAluAir, Rv32LoadStoreAir, Rv32JalLuiAir, Rv32HintStoreAir, etc. — all renamed to Rv64*. Also missing: new *_W chip registrations for GPU backend.

8. AOT layer assumes 32-bit registers throughout (Systemic — known deferral)

The AOT x86 code generation layer is fundamentally broken for RV64: register stride a / 4, XMM helpers (gpr_to_rv32_register, xmm_to_gpr), and *_W AOT impls all use 32-bit assumptions. The CI disabling (chore(ci): comment out aot workflows) acknowledges this.


Code Quality Issues

9. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named.

10. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard, RV32_ALU_OPCODE). Should be removed.

11. Serializer Vec::with_capacity over-allocates 8x

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8x memory waste.

12. Deferral config naming (Rv32Deferral* → should be Rv64Deferral*)

extensions/deferral/circuit/src/extension/mod.rs:224-250:

  • Rv32DeferralConfig struct with fields rv32i: Rv64I, rv32m: Rv64M
  • Rv32DeferralCpuBuilder, Rv32DeferralGpuBuilder

These names are misleading — they contain RV64 types but carry the RV32 prefix.


Architecture Observations (Positive)

  • Algebra extension is fully and correctly ported — all register operations use RV64_REGISTER_NUM_LIMBS, RV64_REGISTER_AS, and u64::from_le_bytes(). The is_eq.rs write data correctly uses 8-byte arrays. Zero remaining RV32_REGISTER_NUM_LIMBS references.

  • ECC extension is clean — no RV32_ references found.

  • riscv-adapters port is solid. The pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers in helpers.rs provide a clean abstraction for the 4-byte pointer + 8-byte register pattern. Zero RV32_* references.

  • Keccak256 non-CUDA code is correctly updated — uses RV64_WORD_NUM_LIMBS throughout.

  • Bigint transpiler is correctly ported — uses RV64_REGISTER_NUM_LIMBS for register stride.

  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6, mask 0x3f) and W-suffix validation (shamt < 32).

  • Non-AOT, non-CUDA execution paths across riscv, algebra, ECC, keccak256, sha2, and bigint extensions are solid — consistent RV64_* constant usage.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag replaces old target_os = "zkvm", ELF64 validation, riscv64im-unknown-none-elf target.

  • Memory address column optimization (commit 7b24bc6) is sound — zero-padding enforced structurally at the memory bus level.


Summary

Priority Issue Impact Status
Critical #1: Deferral AIR uses 4-byte register arrays (8 array defs, ~20 index/range refs) Memory bus mismatch, unsound constraints Open
Critical #2: Deferral transpiler wrong register stride (4 locations) Wrong register addresses Open
High #3: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #4: Riscv AOT Rv32*ExecutorRv64*Executor (4 files, 8 impls) Dead code + compilation failure under aot Trivial rename
High #5: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files, 12 impls) Dead code + compilation failure under aot Trivial rename
High #6-7: Bigint + Riscv CUDA extensions reference old Rv32* types Compilation failure under cuda Known deferral
Medium #8: AOT layer assumes 32-bit registers (systemic) Entire AOT path broken Known deferral (CI disabled)
Low #9: TEMP[jpw] marker Cleanup Minor
Low #10: Inner guest/guest/ directory Confusing stale RV32 leftover Cleanup
Low #11: Serializer 8x over-allocation Wasted memory Minor
Low #12: Deferral config naming Rv32Deferral* Misleading Cosmetic

The core RV64 architecture (non-AOT, non-CUDA paths) is solid across all ported extensions. The algebra, ECC, keccak256, riscv-adapters, and toolchain changes are well-structured with correct constant usage. The *_W chip design with split-width adapters is clean.

The deferral extension remains the primary critical risk — it still has RV32_REGISTER_NUM_LIMBS usage that causes AIR soundness issues and wrong transpiler register addresses.

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS in AIR columns, trace records, memory bridge interactions, range checks, and transpiler stride (~30 locations across 5 files). Follow the riscv-adapters/helpers.rs pattern for pointer handling.
  2. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  3. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) impls (trivial renames)
  4. Remove inner guest/guest/ directory and TEMP[jpw] marker
  5. Track AOT, CUDA, and deferral config naming as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @Tuanlinh12312's task in 3m 10s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (ECC, pairing, algebra, keccak256, bigint, riscv-adapters, reduce memory)
  • Review deferral and algebra extension changes in detail
  • Review toolchain and decoder changes
  • Post final review

Overview

Re-reviewing after the latest commits feat: rv64 pairing (#2734) (b7308ae), feat: rv64 ECC (#2728) (ccdfcb3), and feat: rv64 algebra (#2727) (2bb3249). The PR is at 27 commits, 382 files changed, +17266/−8372 lines.

Previously fixed issues (confirmed):

  • Hint stream length encoding — Fixed: uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • Phantom sub-executor naming — Fixed: renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • Commented-out hintstore — Fixed: hintstore module is now active
  • abstract_compose overflow — Fixed: uses from_u64(1u64 << ...)
  • Range checker sizes — Fixed: properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • Guest toolchain target — Fixed: updated to riscv64im-unknown-none-elf
  • Algebra extension (is_eq, modular_chip, transpiler) — Fixed in feat: rv64 algebra (#2727): now correctly uses RV64_REGISTER_NUM_LIMBS throughout

NEW: ECC, Pairing, Algebra Changes

Algebra Extension — ✅ Well Done

The feat: rv64 algebra (#2727) commit is solid:

  • modular_chip/is_eq.rs:731: Correctly uses RV64_REGISTER_NUM_LIMBS for write data
  • modular_chip/mod.rs:42,54,69: Correctly parameterized with RV64_REGISTER_NUM_LIMBS and RV64_CELL_BITS
  • transpiler/src/lib.rs:102-103,126,204-205: Correctly uses RV64_REGISTER_NUM_LIMBS for register stride
  • Zero RV32_REGISTER_NUM_LIMBS references remain in non-CUDA algebra code

ECC Extension — ✅ Well Done

The feat: rv64 ECC (#2728) commit is clean:

  • Transpiler correctly uses RV64_REGISTER_NUM_LIMBS for register stride
  • Circuit types correctly use Rv64VecHeapAdapterRecord
  • Zero RV32_REGISTER_NUM_LIMBS references in non-CUDA ECC code

Pairing Extension — ⚠️ Contains Bugs (see #1 below)


Bugs (Still Open)

1. NEW: Pairing transpiler uses wrong register stride (Wrong register addresses)

extensions/pairing/transpiler/src/lib.rs:3,47-48:

use openvm_instructions::{instruction::Instruction, riscv::RV32_REGISTER_NUM_LIMBS, ...};
// ...
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64. The ECC and algebra transpilers were correctly updated — this one was missed.

Fix this →

2. NEW: Pairing CUDA builder references Rv32* types (Compilation error under cuda)

extensions/pairing/circuit/src/cuda.rs has 7+ references to old types:

  • Line 4: Imports Rv32ModularBuilder (should be Rv64ModularBuilder)
  • Line 16: Uses Rv32PairingConfig (should be Rv64PairingConfig)
  • Line 19: Defines Rv32PairingGpuBuilder (should be Rv64PairingGpuBuilder)
  • Lines 24, 30, 43: More Rv32* references

The lib.rs:21 re-export pub use cuda::Rv32PairingGpuBuilder as Rv64PairingBuilder papers over the issue but will break when the CUDA file is fixed.

3. Deferral extension uses 4-byte register arrays instead of 8-byte (Soundness bug)

The deferral extension has not been ported to RV64. All register-related operations use RV32_REGISTER_NUM_LIMBS (4) but RV64 registers are 8 bytes wide (DEFAULT_BLOCK_SIZE = 8):

AIR columns — create 4-element arrays for register values:

  • output/air.rs:59-60: rd_val: [T; RV32_REGISTER_NUM_LIMBS], rs_val: [T; RV32_REGISTER_NUM_LIMBS]
  • call/air.rs:230-231: Same pattern

Memory bridge reads — pass 4-element arrays to memory_bridge.read():

  • output/air.rs:260-276: memory_bridge.read(..., local.rd_val, ...) — creates a 4-byte memory read interaction
  • The system memory chip expects 8-byte register reads (matching DEFAULT_BLOCK_SIZE = 8)

Execution code — reads only 4 bytes from registers:

  • output/execution.rs:157: u32::from_le_bytes(exec_state.vm_read(RV32_REGISTER_AS, ...)) — infers 4-byte block size from u32
  • call/execution.rs:173-174: Same pattern

Trace records — 4-byte register value arrays:

  • output/trace.rs:72-73, call/trace.rs:234-235: [u8; RV32_REGISTER_NUM_LIMBS]

Range checks — access wrong index for upper limb:

  • output/air.rs:243: local.rd_val[RV32_REGISTER_NUM_LIMBS - 1] = index 3 (should be 7 for RV64)
  • call/air.rs:312-313: Same pattern

Impact: Memory bus interaction width mismatch: the deferral AIR creates 4-byte register read interactions, but the system memory chip expects 8-byte reads. This will cause proof generation failure or unsound constraints.

Fix this →

4. Deferral transpiler uses wrong register stride (Wrong register addresses)

extensions/deferral/transpiler/src/lib.rs:85-86, 95-96:

RV32_REGISTER_NUM_LIMBS * dec_insn.rd,   // stride 4, should be 8
RV32_REGISTER_NUM_LIMBS * dec_insn.rs1,

Register x5 would be addressed at offset 20 instead of the correct 40.

5. config.rv32i should be config.rv64i (Compilation error under cuda/aot)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line
extensions/riscv/circuit/src/lib.rs 297
extensions/riscv/circuit/src/mul/tests.rs 289
extensions/riscv/circuit/src/mulh/tests.rs 530
extensions/riscv/circuit/src/jalr/tests.rs 586
extensions/riscv/circuit/src/mul_w/tests.rs 416

Fix this →

6. RISCV AOT impls reference non-existent Rv32*Executor types (Compilation error under aot)

8 impl blocks across 4 files implement traits for Rv32* struct names that were renamed to Rv64*:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

7. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot)

12 impl blocks across 6 files:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor

Fix this →

8. AOT layer assumes 32-bit registers throughout (systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers, and RV32_REGISTER_NUM_LIMBS const generics. CI is disabled via chore(ci): comment out aot workflows.

9. CUDA extensions not ported (systemic — known deferral)

riscv/circuit/src/extension/cuda.rs, bigint/circuit/src/extension/cuda.rs, deferral/circuit/src/extension/cuda.rs, pairing/circuit/src/cuda.rs all reference old Rv32* types.


Code Quality Issues

10. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

11. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ uses stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard, RV32_ALU_OPCODE). Should be removed.

12. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes element count. Should be / 8.

13. Deferral config/builder naming

Rv32DeferralConfig, Rv32DeferralCpuBuilder, Rv32DeferralGpuBuilder in deferral/circuit/src/extension/mod.rs:224-250 — these wrap Rv64I, Rv64M, Rv64Io types but carry the Rv32 prefix.


Architecture Observations (Positive)

  • Algebra, ECC, and pairing extension ports (latest 3 commits) are well-structured. The algebra extension correctly uses RV64_REGISTER_NUM_LIMBS throughout its circuit, transpiler, and execution code. ECC transpiler is clean.

  • riscv-adapters port is clean with good helper abstractions: pad_reg_val, compose_ptr, tracing_read_reg_ptr correctly handle the 4-byte materialized pointer + 8-byte register pattern. Zero RV32_* references.

  • Memory address column optimization (commit 7b24bc6) is sound — materializes only significant limbs (RV64_WORD_NUM_LIMBS = 4 for pointers) while zero-padding to full register width in bus interactions.

  • Keccak256 and SHA2 Rust code is clean — no remaining RV32_* references in non-CUDA files.

  • Non-AOT, non-CUDA execution paths across the riscv extension remain solid — consistent use of RV64_REGISTER_AS, RV64_REGISTER_NUM_LIMBS, vm_read::<u8, 8>.

  • *_W adapter/core pattern is well-designed with proper sign extension via (result as i32 as i64 as u64).

  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, guest openvm_intrinsics cfg is clean.


Summary

Priority Issue Impact Status
Critical #3: Deferral AIR uses 4-byte register arrays Soundness: memory bus mismatch Open
Critical #4: Deferral transpiler wrong register stride Wrong register addresses Open
High #1: Pairing transpiler wrong register stride Wrong register addresses New
High #5: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Open
High #6: Riscv AOT Rv32*ExecutorRv64*Executor (4 files) Dead code + compilation failure under aot Open
High #7: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files) Dead code + compilation failure under aot Open
Medium #2: Pairing CUDA builder Rv32* types Compilation failure under cuda New
Medium #8-9: AOT/CUDA systemic issues Known deferral (CI disabled) Tracked
Low #10: TEMP[jpw] marker Cleanup Open
Low #11: Inner guest/guest/ directory Confusing stale RV32 leftover Open
Low #12: Serializer 8× over-allocation Wasted memory Open
Low #13: Deferral config naming Rv32Deferral* Misleading naming Open

The core RV64 architecture (non-AOT, non-CUDA) is solid across the riscv, keccak256, sha2, algebra, ECC, and bigint extensions. The latest commits (algebra, ECC, pairing) are well-structured and demonstrate correct RV64_* constant usage.

The deferral extension remains the primary risk — it has critical RV32_REGISTER_NUM_LIMBS usage that produces wrong register array sizes, wrong execution reads, wrong transpiler addresses, and unsound AIR constraints. The pairing transpiler also has a wrong register stride.

Recommended before merge:

  1. Fix deferral extension — update all RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS across circuit AIR, trace, execution, and transpiler (~40 occurrences)
  2. Fix pairing transpiler — change RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS (3 occurrences)
  3. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  4. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) impls
  5. Remove inner guest/guest/ directory and TEMP[jpw] marker
  6. Track AOT, CUDA, and deferral rename as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude finished @Tuanlinh12312's task in 6m 41s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (deferral, pairing)
  • Review deferral extension in detail
  • Review circuit and extension changes
  • Post final review

Overview

Re-reviewing after the latest commits feat: rv64 deferral (#2735) (01ac821) and feat: rv64 pairing (#2734) (b7308ae). The PR is at 28 commits, 100 changed files, +17393/−8303 lines.

Previously critical bugs now FIXED:

  • Deferral AIR register arraysFixed: now uses RV64_WORD_NUM_LIMBS (4) for materialized pointer columns and expand_to_rv64_register() for 8-byte memory bus interactions. Correctly follows the riscv-adapters pattern.
  • Deferral transpiler strideFixed: now uses RV64_REGISTER_NUM_LIMBS (8) for register address computation (transpiler/src/lib.rs:85-86, 95-96).
  • Deferral config namingFixed: renamed to Rv64DeferralConfig, Rv64DeferralCpuBuilder with fields rv64i, rv64m.
  • Algebra extension — Previously confirmed fixed in feat: rv64 algebra (#2727).

Bugs (Still Open)

1. Pairing transpiler uses RV32_REGISTER_NUM_LIMBS (stride 4 instead of 8) (Critical — wrong register addresses)

extensions/pairing/transpiler/src/lib.rs:47-48:

F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs1),  // stride 4, should be 8
F::from_usize(RV32_REGISTER_NUM_LIMBS * dec_insn.rs2),

Import on line 3 also references RV32_REGISTER_NUM_LIMBS instead of RV64_REGISTER_NUM_LIMBS.

Register x5 would be addressed at offset 20 instead of the correct 40 for RV64. This will cause the phantom instruction to read from wrong register addresses.

Fix this →

2. config.rv32i should be config.rv64i (Compilation error under cuda/aot features)

Rv64ImConfig has field rv64i (line 101), but 5 locations still reference config.rv32i:

File Line Feature gate
riscv/circuit/src/lib.rs 297 cuda
riscv/circuit/src/mul/tests.rs 289 cuda
riscv/circuit/src/mulh/tests.rs 530 aot
riscv/circuit/src/jalr/tests.rs 586 aot
riscv/circuit/src/mul_w/tests.rs 416 aot

Fix this →

3. RISCV AOT impls reference non-existent Rv32*Executor types (Compilation error under aot)

8 impl blocks across 4 files implement traits for struct names that were renamed to Rv64*:

File Lines Has Should be
jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
jalr/execution.rs 97, 185 Rv32JalrExecutor Rv64JalrExecutor
hintstore/execution.rs 111, 157 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

4. RISCV AOT impls for non-W chips use wrong const generic — dead code

Non-W chips define Rv64*Executor with RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — making them dead code that never gets dispatched:

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 20, 41

Note: The *_W chips' AOT impls also use RV32_REGISTER_NUM_LIMBS (4) but this happens to equal RV64_WORD_NUM_LIMBS (4) — functionally correct but semantically misleading. Should use RV64_WORD_NUM_LIMBS instead.

5. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot)

12 impl blocks across 6 files:

File Lines Has Should be
base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

6. CUDA extensions not ported (Known deferral — CI disabled)

All CUDA code across riscv, bigint, deferral, pairing, keccak256, and sha2 extensions still references old Rv32* types. Specific examples:

  • riscv/circuit/src/extension/cuda.rs — imports Rv32BaseAluAir, Rv32LoadStoreAir, etc.
  • bigint/circuit/src/extension/cuda.rs — references Rv32*256Air types
  • deferral/circuit/src/output/cuda.rs and call/cuda.rs — uses RV32_CELL_BITS instead of RV64_CELL_BITS
  • pairing/circuit/src/cuda.rs — uses Rv32PairingGpuBuilder, Rv32PairingConfig, Rv32ModularBuilder
  • pairing/circuit/src/lib.rs:21pub use cuda::Rv32PairingGpuBuilder as Rv64PairingBuilder (aliasing wrong name)

This is acknowledged via chore(ci): comment out aot workflows.

7. AOT layer assumes 32-bit registers throughout (Known deferral — CI disabled)

Register stride a / 4, XMM helpers (gpr_to_rv32_register, xmm_to_gpr), and AOT const generics all assume 32-bit register layout. Needs fundamental redesign for 64-bit register values.


Code Quality Issues

8. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:320-321:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

9. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. Should be / 8.

10. Duplicate openvm-riscv-guest directory

Both extensions/riscv/guest/Cargo.toml and extensions/riscv/guest/guest/Cargo.toml declare package.name = "openvm-riscv-guest". The inner guest/guest/ is stale RV32 code (hint_buffer_u32!, old cfg(target_os = "zkvm") guard, RV32_ALU_OPCODE). Should be removed.

11. Hardcoded 8 instead of WORD_SIZE in io/mod.rs

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

Should use the WORD_SIZE constant defined in platform/src/lib.rs:23.


Architecture Observations (Positive)

  • Deferral extension port (latest commit 01ac821) is well-done. Follows the established pattern: RV64_WORD_NUM_LIMBS for materialized pointer columns, expand_to_rv64_register() for 8-byte padding in memory bus interactions. Transpiler correctly uses RV64_REGISTER_NUM_LIMBS for register stride.

  • Pairing extension port (b7308ae) is clean in non-CUDA code. Config correctly uses Rv64PairingConfig with proper field names.

  • riscv-adapters remain exemplary — zero RV32_* references, clean pad_reg_val/compose_ptr/tracing_read_reg_ptr helpers.

  • Non-AOT, non-CUDA execution paths across all extensions (riscv, algebra, ECC, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.

  • Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces old target_os = "zkvm".


Summary

Priority Issue Impact Status
Critical #1: Pairing transpiler wrong register stride Wrong phantom register addresses NEW — trivial fix
High #2: config.rv32iconfig.rv64i (5 locations) Compilation failure under cuda/aot Trivial fix
High #3: Riscv AOT Rv32*ExecutorRv64*Executor (4 files) Dead code + compilation failure under aot Trivial rename
High #4: Riscv non-W AOT impls wrong const generic (4 files) Dead code under aot Part of AOT port
High #5: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files) Dead code + compilation failure under aot Trivial rename
Medium #6: CUDA extensions not ported (all extensions) GPU proving broken Known deferral (CI disabled)
Medium #7: AOT layer assumes 32-bit registers (systemic) Entire AOT path broken Known deferral (CI disabled)
Low #8: TEMP[jpw] marker Cleanup Minor
Low #9: Serializer 8× over-allocation Wasted memory Minor
Low #10: Inner guest/guest/ directory Confusing stale RV32 leftover Cleanup
Low #11: Hardcoded 8 vs WORD_SIZE Style inconsistency Minor

Progress Since Last Review

The deferral extension — previously the most critical open issue — has been fully and correctly ported to RV64. The new commit follows the established riscv-adapters pattern and eliminates all non-CUDA RV32_* references.

Recommended Before Merge

  1. Fix pairing transpiler — change RV32_REGISTER_NUM_LIMBS to RV64_REGISTER_NUM_LIMBS on lines 3, 47, 48 (trivial, 1 file)
  2. Fix config.rv32iconfig.rv64i in 5 locations (trivial)
  3. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files) and bigint AOT (6 files) — trivial renames
  4. Remove inner guest/guest/ directory and TEMP[jpw] marker
  5. Track AOT, CUDA port as separate work items

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude encountered an error —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (guest lib structs/fns update)
  • Review deferral, pairing, algebra, bigint extensions
  • Review riscv circuit and toolchain changes
  • Post final review

Working on it…

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@branch-rebase-bot branch-rebase-bot Bot force-pushed the develop-v2.0.0-rc.1 branch from dada909 to e4aab20 Compare April 24, 2026 19:54
876pol and others added 8 commits April 27, 2026 16:02
Resolves INT-6198, INT-6199, INT-6200.

Migrates the `openvm`, `openvm-platform` crates from RV32 to RV64.

**Scope note:** This PR only covers the guest-side compilation path
(crate definitions, assembly, types, cfg gating). The actual execution
logic for the I/O instructions (how the interpreter handles
hint/reveal/print operations at runtime) is handled separately in the
interpreter PR.

Compilation and interpreting with the new openvm toolchain have been
tested with guest programs from this repo:
https://github.com/axiom-crypto/rv64-minimal-guest-test.

All `#[cfg(target_os = "zkvm")]` / `#[cfg(not(target_os = "zkvm"))]`
attributes replaced with `#[cfg(openvm_intrinsics)]` /
`#[cfg(not(openvm_intrinsics))]`. This decouples guest-code gating from
the risc0 custom target OS and instead uses a custom cfg flag injected
via rustflags.

= "riscv64"`

Conditional dependency sections switched from `cfg(target_os = "zkvm")`
to `cfg(target_arch = "riscv64")`, and the guest dependency changed from
`openvm-rv32im-guest` to `openvm-rv64im-guest`. Added `check-cfg` for
`openvm_intrinsics` to suppress warnings.

- `WORD_SIZE`: 4 → 8 (`platform/src/lib.rs`)
- `STACK_TOP`, `TEXT_START`, `is_guest_memory`: `u32` → `u64`
(`platform/src/memory.rs`)
- `STACK_TOP` static in entry point: `u32` → `u64` (`openvm/src/lib.rs`)

- `sw x0, 1(x0)` → `sd x0, 1(x0)` (fault instruction,
`openvm/src/lib.rs`)
- `lw sp, 0(sp)` → `ld sp, 0(sp)` (entry point stack pointer load,
`openvm/src/lib.rs`)

Replaced with glibc implementations compiled for `riscv64 -march=rv64im`
(source: [glibc
2.42](https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/string)).
Uses 64-bit `ld`/`sd` operations throughout.

All hint/reveal/read functions updated to operate on 8-byte units:
- `read_u32()` → `read_u64()`
- Hint stream reads use `u64` length prefixes (8 bytes instead of 4)
- Serde serializer/deserializer operate on `u64` chunks
- `hint_buffer_u32!` equivalent updated for 64-bit

**Note:** `reveal_bytes32` has **not** been changed — it still reveals
32 bytes (256 bits), but just reads them in 8-byte chunks instead of
4-byte chunks.

Removed the PAL ABI stubs (`sys_halt`, `sys_read`, `sys_write`,
`sys_rand`, etc.) that were required for `std` compatibility with the
risc0 `riscv32im-risc0-zkvm-elf` target. No longer needed since rv64
uses a standard target (`riscv64im-unknown-none-elf`) without a custom
OS ABI.
…64im (#2446)

Transpiler PRs should be reviewed in this order: #2446, #2447, #2448.

Summary of this PR:

- Some files related to the decoding of RV32 are copied from
https://github.com/GregAC/rrs/tree/main/rrs-lib/src to
`crates/toolchain/decoder`.
- Everything in `extensions/rv32im/transpiler` has been copied to
`extensions/rv64im/transpiler`.
Transpiler PRs should be reviewed in this order: #2446, #2447, #2448.

Summary of this PR:

- Support for decoding of RV64.
Resolves INT-6044.

Transpiler PRs should be reviewed in this order: #2446, #2447, #2448.

Summary of this PR:

- Code that was hardcoded for RV32 has been changed to RV64.
- `rrs-lib` was removed as a dependency, and all crates now use
`crates/toolchain/decoder`.
- Some tests have been added to test the RV64 transpiler, but these
tests are still messy -- this will be fixed in the ticket involving the
integration tests for the full toolchain and interpreter flow.
…fix compilation errors (#2457)

Resolves INT-6318.

Summary:

- The `extensions/rv32im` has been moved to `extensions/riscv`.
- The old `extensions/rv32im/transpiler` module has been entirely
deleted and replaced with the transpiler module created in
#2448.
- Many objects prefixed with `Rv32` are renamed to start with `Rv64`. A
list of these objects can be found in
https://docs.google.com/document/d/1mZTHk7n2LY-N1WMrjzn_BZmSYtP4PGnflH97Xjrs9EY/edit?tab=t.0.
Towards INT-6319.

Summary:
- `base_alu`, `less_than`, and `shift` submodules of `riscv` have been
ported from RV32 to RV64
- All types/structs renamed from `Rv32*` to `Rv64*` (e.g.,
`Rv32BaseAluAdapterAir` → `Rv64BaseAluAdapterAir`)
- Operand width changed from `u32`/4 bytes to `u64`/8 bytes throughout
(registers, ALU ops, compose/decompose helpers)
- `Rv64BaseAluAdapter` (shared by all three submodules) updated for RV64
- New constraints added to enforce bytes 3–7 all equal the sign byte for
correct 64-bit sign extension (previously only byte 3 was constrained
for 32-bit)
- Added tests for the above constraint in `base_alu`, `less_than`,
`shift`.
- Changed constant `DEFAULT_U8_BLOCK_SIZE` from 4 to 8 (matching RV64
register width)
- Commented out unported code in `extension/mod.rs` and `lib.rs`
(load/store, branch, jal/jalr, auipc, mul/div — all marked `TEMP:
disabled until ported to RV64`) with manual empty-enum trait impls to
keep the crate compiling
- Updated `test_utils.rs` to use RV64 conventions
(`generate_rv64_is_type_immediate`, `rv64_rand_write_register_or_imm`)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…mul_w, divrem_w (#2483)

Towards INT-6340, INT-6341, INT-6342, INT-6343.

This PR copies the RV32 `base_alu`, `shift`, `mul`, and `divrem` modules
into new `base_alu_w`, `shift_w`, `mul_w`, and `divrem_w` modules. These
copies will serve as the starting point for implementing the RV64 32-bit
word instruction variants.

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Ayush Shukla <shuklaayush@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Tuanlinh12312 and others added 17 commits April 27, 2026 16:03
Resolves INT-7523. Should be merged after #2728.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves INT-7524.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

Switches guest dependencies that need atomic CAS (`once_cell::race` in
the algebra/ecc macros, `spin::Mutex` in `guest-libs/keccak256`) to
`portable-atomic` so they compile on stock `riscv64im-unknown-none-elf`,
fixing CI failures across the
algebra/ecc/pairing/ff_derive/k256/p256/keccak256 guest tests. The stock
target's spec declares `atomic-cas: false`, so rustc cfg-gates
`AtomicUsize::compare_exchange` out at the language level; `-C
passes=lower-atomic` can't help since it acts on LLVM IR after rustc has
already rejected the call. rc.1 avoided this on
`riscv32im-risc0-zkvm-elf`, which claims CAS support in its spec; we
keep guest programs on the stock upstream triple and patch the
dependency instead.

Switches `once_cell` (and `spin::Mutex` in `guest-libs/keccak256`) to
their `portable-atomic` backend, with `portable-atomic` configured to
use `critical-section` as its CAS fallback. `critical-section` is a
minimal no_std mutual-exclusion interface: exactly one crate in the tree
registers an `Impl` via `critical_section::set_impl!`, and every
consumer calls `critical_section::with(...)`; without an impl the call
panics at runtime. Off-the-shelf impls target interrupt-driven hardware
and disable interrupts via CSR reads/writes, which the openvm VM can't
transpile (they map to `unimp` and blow up at runtime).
`openvm-platform` now registers a no-op impl, sound because the openvm
guest is single-threaded, so CAS reduces to a plain load+store with zero
CSR emission. A `SAFETY:` comment flags the single-thread invariant.

Guest vs host gating now follows one rule across both code and
manifests:

- `.rs` files: `cfg(openvm_intrinsics)` (set by the guest build
rustflags) for openvm-specific code paths, `cfg(not(feature = "std"))`
for std-availability (`no_std`), `cfg(target_os = "none")` for
runtime-specific attributes (`no_main`, `openvm::entry!` registration).
The `no_std` / `no_main` split matches their semantics: std availability
is a package decision, entry-point registration is a runtime/target
property.
- `Cargo.toml` manifests: `cfg(not(target_os = "none"))` for host-only
dep gates (Cargo target predicates only accept built-in cfgs, not
`openvm_intrinsics`). The `portable-atomic` CAS polyfill keeps the
narrower `cfg(all(target_arch = "riscv64", target_os = "none"))` since
the `critical-section` backend is arch-specific.
- `--check-cfg=cfg(openvm_intrinsics)` added to guest-build rustflags in
`crates/toolchain/build/src/lib.rs` so user guest programs never see the
unexpected-cfgs warning.
- `[workspace.lints.rust]` at the workspace root declares
`openvm_intrinsics`; openvm-* crates inherit via `[lints] workspace =
true`. Standalone sub-workspaces (tests/programs, book examples) can't
inherit, so they declare it explicitly.

- `resolver = "2"` on all standalone `[workspace]` sub-workspaces
(tests/programs, book examples, cli/tests). Bare `[workspace]` defaulted
to v1, which unified proc-macro features into guest builds and dragged
`rand_core/std` and `getrandom/std` into the rv64 target.
- Cargo.toml host-only dep gates swapped from the stale-on-rv64
`cfg(not(target_os = "zkvm"))` to `cfg(not(target_os = "none"))`.

Three cascading rv64-port bugs, only exercised by the `final_exp_hint`
tests:

- `extensions/pairing/transpiler/src/lib.rs` used
`RV32_REGISTER_NUM_LIMBS` (= 4) to compute the phantom's `a`/`b` operand
addresses. On rv64 registers live at 8-byte boundaries in
`RV64_REGISTER_AS`, so the phantom read zero. Changed to
`RV64_REGISTER_NUM_LIMBS`.
- `guest-libs/pairing/src/{bn254,bls12_381}/pairing.rs` constructed
`(P.as_ptr() as u32, P.len() as u32)` as a fat pointer. The host phantom
reads at offsets `0` and `RV64_REGISTER_NUM_LIMBS` (= 8), but the `(u32,
u32)` tuple laid fields at offsets 0 and 4. Changed to `(u64, u64)`.
- The adjacent `hint_buffer_chunked(ptr, (N * 12 * 2) / 4)` used the
rv32 word-size divisor. Changed to `/
openvm_riscv_guest::HINT_WORD_BYTES`.

`hint_store_u32!` to `hint_store_u64!` in `algebra/moduli-macros`. New
`openvm_riscv_guest::HINT_WORD_BYTES = 8` replaces `/ 4` and `/ 8` magic
numbers. Host-side `SqrtHintSubEx` padding bumped from `.take(4)` to
`.take(RV64_REGISTER_NUM_LIMBS)` to match the 8-byte guest read.

- `rv32`/`riscv32` to `rv64`/`riscv64` across SDK: `Sdk::riscv32()` and
`AppConfig::riscv32()` become `riscv64()`, struct fields `rv32i`/`rv32m`
become `rv64i`/`rv64m`, `openvm_riscv32.toml` renamed to
`openvm_riscv64.toml` with matching TOML keys. Historical `v1.4/` and
`v1.5/` snapshots untouched.
- `Rv32Modular*`, `Rv32Weierstrass*`, `Rv32Pairing*` imports in failing
guest-lib integration tests renamed to `Rv64*`.
- `crates/sdk/guest/fib/src/main.rs`: `reveal_u32` to `reveal_u64`,
`fibonacci` return type and locals widened from `u32` to `u64`.

All 7 book examples were designed as host-run programs with `openvm/std`
force-enabled, which dragged std into rv64 guest builds. Per-example:
removed `features = ["std"]` from the `openvm` dep, moved `std` behind
an example-local `std = ["openvm/std", ...]` feature, added
`default-features = false` to transitive std-leakers (`serde`,
`num-bigint`, `alloy-primitives`, `openvm-ruint`, `tiny-keccak`,
`hex-literal`), and added the `no_main`/`no_std`/`openvm::entry!`
guest-program scaffolding. The `i256` example additionally gets `bytes`
with the `extra-platforms` feature and `portable-atomic` with
`critical-section` on the riscv64-none target, since `alloy-primitives`
transitively pulls `bytes` which uses `compare_exchange` directly.

The test was passing `["std"]` to enable ff's `BatchInvert` trait via a
`#[cfg(feature = "std")]` block in the guest program. On rv64 stock,
activating `feature = "std"` at guest level chains through `openvm/std`
and tries to link std, which riscv64im-unknown-none-elf doesn't have.
Dropped `["std"]` from the test harness call; the std-only inner block
is compiled out (the two `BatchInverter::invert_with_*_scratch` paths
still run). Generated init file renamed from
`openvm_init_batch_inversion_std.rs` to `openvm_init_batch_inversion.rs`
to match the new build call.

- Deleted `extensions/rv32im/` (empty pre-rv64 leftover) and an
untracked stale `extensions/riscv/guest/guest/` copy with the old rv32
hint-store API.
- Removed dead duplicate target-cfg block in
`extensions/pairing/circuit/Cargo.toml`.
- Fixed stale `cfg(target_os = "zkvm")` gate in
`examples/keccak/Cargo.toml`.
- Removed duplicate `critical_section::set_impl!` in
`crates/toolchain/platform/src/heap/embedded.rs`; the canonical no-op
impl now lives in `crates/toolchain/platform/src/critical_section.rs`.
Resolves INT-7566.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, mulh, divrem, jal_lui, jalr, auipc (#2741)

Resolves INT-7550.

---------

Co-authored-by: Tuanlinh12312 <136139181+Tuanlinh12312@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves INT-7553.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves INT-7552.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves INT-7564, INT-7528.
Resolves INT-7554.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shuklaayush shuklaayush force-pushed the develop-v2.1.0-rv64 branch from 3bd85a1 to 56e4f86 Compare April 27, 2026 14:04
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Claude finished @shuklaayush's task in 7m 29s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)


Overview

Re-reviewing after 16 new commits since the last review (which covered up to feat: rv64 pairing (#2734)). The PR is now at 43 commits, 100 changed files, +20882/−10689 lines. The new commits include:

Previously reported bugs now FIXED (6 of 10):

  • config.rv32iconfig.rv64i in 5 locations — Fixed in all files
  • ✅ CUDA extension references old Rv32* types — Fixed in all CUDA extension files (riscv, bigint, deferral, pairing, keccak256, sha2)
  • ✅ Deferral extension uses 4-byte register arrays — Fixed in feat: rv64 deferral (#2735)
  • ✅ Deferral transpiler wrong register stride — Fixed
  • ✅ Pairing transpiler wrong register stride — Fixed
  • ✅ Duplicate openvm-riscv-guest directory — Fixed (removed)

CUDA Port Assessment — Well Done

The CUDA port across 8 commits is comprehensive and correct:

  • extensions/riscv/circuit/src/extension/cuda.rs — All 22+ AIR and ChipGpu types correctly reference Rv64* (Rv64BaseAluAir, Rv64ShiftAir, Rv64LoadStoreAir, etc.). Includes new W-suffix chips (Rv64BaseAluWChipGpu, Rv64ShiftWChipGpu, Rv64MulWChipGpu, Rv64DivRemWChipGpu).

  • crates/circuits/primitives/cuda/include/primitives/constants.h — Correctly defines both RV32 and RV64 constants: RV64_REGISTER_NUM_LIMBS = 8, RV64_WORD_NUM_LIMBS = 4, RV64_CELL_BITS = 8, RV64_IMM_AS = 0. DEFAULT_BLOCK_SIZE = 8 is correct.

  • CUDA .cu files — All properly use RV64_REGISTER_NUM_LIMBS and RV64_WORD_NUM_LIMBS. Zero RV32_* constant references in any .cu or .cuh file.

  • Bigint, deferral, pairing CUDA — All extension CUDA files correctly reference Rv64* types.


Bugs (Still Open)

1. AOT impls in riscv circuit reference non-existent Rv32*Executor types (Compilation error under aot feature)

8 impl blocks across 4 files implement traits for struct names that were renamed to Rv64*. These are dead code — the impls target types that don't exist. Will fail to compile when aot feature is enabled.

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 98, 186 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 112, 158 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

2. AOT impls for non-W riscv chips use wrong const generic — dead code (under aot feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — these impls will never be dispatched:

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 21, 41

Note: The *_W chips' AOT impls also use RV32_REGISTER_NUM_LIMBS (4), but this happens to equal RV64_WORD_NUM_LIMBS (4) — functionally correct but semantically misleading.

3. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot feature)

12 impl blocks across 6 files implement traits for Rv32*256Executor types that don't exist (renamed to Rv64*256Executor):

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

4. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported. Line 14 imports MulWOpcode from openvm_riscv_transpiler, and crate::common::* (line 19) doesn't re-export MulOpcode. This will fail to compile under the aot feature.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports, or add it to the main import block.

5. AOT layer assumes 32-bit registers throughout (Systemic — known issue)

The entire AOT x86 code generation layer is fundamentally broken for RV64:

  • Register stride a / 4 hardcoded (e.g., branch_eq/execution.rs:131-132, load_sign_extend/aot.rs:115)
  • XMM helpers gpr_to_rv32_register and xmm_to_gpr in common/mod.rs:25-85 assume 2 RISC-V registers per XMM
  • Error messages in update_adapter_heights_asm (common/mod.rs:108-116) say "RV32_REGISTER_AS" and "RV32_MEMORY_AS"

The CI disabling (chore(ci): comment out aot workflows) acknowledges this. The AOT layer needs a fundamental redesign for 64-bit register values since XMM elements are 32-bit.


Code Quality Issues

6. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:354-357:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {
    data.map(|x| x.as_canonical_u32() as u8)
}

Leftover temp function with debug marker — should be cleaned up or properly named before merge.

7. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8× memory waste.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / 8) or just use Vec::new().


Architecture Observations (Positive)

  • CUDA port is comprehensive and clean. All 8 CUDA commits correctly migrate to RV64 types and constants. The new W-suffix GPU chips are properly registered. Zero Rv32* type references remain in any CUDA-related code (.rs, .cu, .cuh).

  • Guest program migration is thorough. cfg(openvm_intrinsics) cleanly replaces cfg(target_os = "zkvm") across 119 uses. All benchmark guests properly use no_std. Guest lib functions use rv64 prefix.

  • feat: rv64 more cleanup (#2752) centralizes tracing_read_reg_ptr in adapters/mod.rs:235-251 with proper pointer_max_bits parameter and debug assertions, eliminating duplicate helpers from riscv-adapters/src/helpers.rs.

  • Non-AOT execution paths across all extensions (riscv, algebra, ECC, keccak256, sha2, bigint, deferral, pairing) are solid — consistent RV64_* constant usage.

  • *_W adapter/core pattern remains well-designed with proper sign extension via (result as i32 as i64 as u64).

  • Constants header (primitives/constants.h) correctly defines both RV32 and RV64 constants for multi-ISA CUDA support, with DEFAULT_BLOCK_SIZE = 8.


Summary

Priority Issue Impact Status
High #1: Riscv AOT Rv32*ExecutorRv64*Executor (4 files, 8 impls) Compilation failure under aot Trivial rename
High #2: Riscv non-W AOT impls wrong const generic (4 files) Dead code under aot Part of AOT port
High #3: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files, 12 impls) Compilation failure under aot Trivial rename
High #4: Missing MulOpcode import in mul_w AOT Compilation failure under aot Trivial fix
Medium #5: AOT layer assumes 32-bit registers (systemic) Entire AOT path broken Known deferral (CI disabled)
Low #6: TEMP[jpw] marker Cleanup Minor
Low #7: Serializer 8× over-allocation Wasted memory Minor

Progress Since Last Review

Major improvements — 6 of 10 previously reported bugs have been fixed:

  • config.rv32iconfig.rv64i (5 locations) — Fixed
  • ✅ CUDA extension old Rv32* types (all extensions) — Fully ported across 8 commits
  • ✅ Deferral extension soundness issues — Fixed
  • ✅ Pairing transpiler wrong stride — Fixed
  • ✅ Duplicate guest/guest/ directory — Removed
  • ✅ Guest lib cfg(target_os = "zkvm")cfg(openvm_intrinsics) — Fixed

The core RV64 architecture (non-AOT paths) is solid and complete. The CUDA port is comprehensive. The only remaining issues are in the AOT layer, which is acknowledged as deferred with CI disabled.

Recommended Before Merge

  1. Fix Rv32*ExecutorRv64*Executor in riscv AOT impls (4 files, 8 occurrences) — trivial renames
  2. Fix Rv32*256ExecutorRv64*256Executor in bigint AOT impls (6 files, 12 occurrences) — trivial renames
  3. Add missing MulOpcode import in mul_w/execution.rs — 1 line
  4. Remove TEMP[jpw] marker / rename tmp_convert_to_u8s in adapters/mod.rs:354
  5. Track systemic AOT redesign as a separate work item

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Claude finished @shuklaayush's task in 6m 22s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new CUDA port commits
  • Review cleanup and guest program changes
  • Review remaining code quality issues
  • Post final review

Overview

Re-reviewing after 15 new commits since the last successful review. The PR is at 44 commits, 100 changed files, +20882/−10689 lines. Major new work:

Previously critical bugs now FIXED:

  • config.rv32iconfig.rv64i — all 5 locations fixed
  • ✅ Deferral AIR register arrays — now uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register() for 8-byte bus interactions
  • ✅ Deferral transpiler wrong register stride — now uses RV64_REGISTER_NUM_LIMBS
  • ✅ Pairing transpiler wrong register stride — fixed
  • ✅ CUDA extension Rv32* type references — all ported to Rv64*
  • ✅ Duplicate guest/guest/ directory — removed
  • ✅ Deferral config naming (Rv32Deferral*Rv64Deferral*) — fixed

Bugs (Still Open)

1. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 98, 186 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 112, 158 Rv32HintStoreExecutor Rv64HintStoreExecutor

These are dead code — the impls apply to types that don't exist. Will fail to compile when aot feature is enabled.

Fix this →

2. AOT impls for non-W chips use wrong const generic RV32_REGISTER_NUM_LIMBS (Dead code under aot)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — making these impls dead code that never gets dispatched:

File Lines Const generic
base_alu/execution.rs 177, 245 { RV32_REGISTER_NUM_LIMBS } → should be { RV64_REGISTER_NUM_LIMBS }
less_than/execution.rs 128, 247 same
divrem/execution.rs 108, 273 same
load_sign_extend/aot.rs 21, 41 same

Additionally, base_alu_w/execution.rs lines 166, 234 use RV32_REGISTER_NUM_LIMBS (4) — for W-suffix chips, this happens to equal RV64_WORD_NUM_LIMBS (4), so it's functionally correct but should use RV64_WORD_NUM_LIMBS for semantic clarity.

3. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot)

12 impl blocks across 6 files:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor

Fix this →

4. AOT common/mod.rs helper functions assume 32-bit register layout (Systemic AOT issue)

extensions/riscv/circuit/src/common/mod.rs:16 imports RV32_MEMORY_AS, RV32_REGISTER_AS and the helper functions (gpr_to_rv32_register, xmm_to_gpr, update_adapter_heights_asm) are all designed for the 32-bit register-to-XMM mapping (2 registers per XMM via pinsrd/pextrd). For RV64, registers are 64 bits each and would need 1:1 XMM mapping or use pinsrq/pextrq.

The error messages at lines 107-117 also say "RV32_REGISTER_AS" and "RV32_MEMORY_AS".

Similarly, AOT code throughout the loadstore/aot.rs, branch_lt/aot.rs, branch_eq/execution.rs, and all *_W execution files use RV32_REGISTER_AS and RV32_IMM_AS — functionally correct (same numeric values as RV64_* equivalents) but semantically misleading.

This is a known deferral — AOT CI is disabled via TODO(aot): re-enable comments in .github/workflows/extension-tests.yml.


Code Quality Issues

5. TEMP[jpw] marker in production code

extensions/riscv/circuit/src/adapters/mod.rs:354:

// TEMP[jpw]
pub fn tmp_convert_to_u8s<F: PrimeField32, const N: usize>(data: [F; N]) -> [u8; N] {

Leftover temp function with debug marker — should be cleaned up or properly named.

6. Serializer Vec::with_capacity over-allocates 8×

crates/toolchain/openvm/src/serde/serializer.rs:57:

let mut vec: Vec<u64> = Vec::with_capacity(core::mem::size_of_val(value));

size_of_val returns bytes, but with_capacity takes number of u64 elements. A 64-byte value allocates capacity for 64 u64 words (512 bytes) instead of 8 (64 bytes). Functionally correct but 8× memory waste.

Fix: Vec::with_capacity(core::mem::size_of_val(value) / core::mem::size_of::<u64>()) or Vec::new().

7. Hardcoded 8 instead of WORD_SIZE in io/mod.rs

crates/toolchain/openvm/src/io/mod.rs:60-61:

let num_words = len.div_ceil(8);
let capacity = num_words * 8;

WORD_SIZE is defined as 8 in platform/src/lib.rs. Should use the constant for consistency with the rest of the codebase.

8. Rv32ModularArithmeticOpcode enum not renamed

extensions/algebra/transpiler/src/lib.rs:21 defines pub enum Rv32ModularArithmeticOpcode — this is referenced in ~30 locations across algebra/circuit/src/ (preflight.rs, execution.rs, modular.rs, hybrid.rs, muldiv.rs, addsub.rs). Functionally correct since it's just an enum name, but inconsistent with the rest of the Rv32Rv64 rename.

9. Test error message strings still say "Rv32IM"

4 test files reference "failed to create Rv32IM executor" in their expect() strings:

  • mul/tests.rs:290, mulh/tests.rs:531, jalr/tests.rs:591, mul_w/tests.rs:417

Cosmetic only — these are behind aot/cuda feature gates.


CUDA Port Assessment (New)

The CUDA port across 8 commits is well done:

  • extensions/riscv/circuit/src/extension/cuda.rs — correctly uses all Rv64* types, no Rv32* imports
  • ✅ All non-W chip CUDA files (base_alu/cuda.rs, shift/cuda.rs, less_than/cuda.rs, mul/cuda.rs, mulh/cuda.rs, divrem/cuda.rs) — correctly use RV64_REGISTER_NUM_LIMBS and RV64_CELL_BITS
  • ✅ All W-suffix chip CUDA files (base_alu_w/cuda.rs, shift_w/cuda.rs, mul_w/cuda.rs, divrem_w/cuda.rs) — correctly use RV64_WORD_NUM_LIMBS and RV64_CELL_BITS
  • bigint/circuit/src/extension/cuda.rs — correctly uses Rv64*256Air types
  • crates/circuits/primitives/cuda/include/primitives/constants.h — correctly defines both RV32 and RV64 constants with RV64_REGISTER_NUM_LIMBS = 8, RV64_WORD_NUM_LIMBS = 4, RV64_CELL_BITS = 8
  • ✅ SHA2, hintstore, loadstore, load_sign_extend CUDA files all correctly ported
  • ✅ Keccak256 and SHA2 extensions — zero remaining RV32_* references in Rust code

Architecture Observations (Positive)

  • Non-AOT, non-CUDA paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage throughout.

  • CUDA port follows the correct pattern for all chip types: non-W chips use RV64_REGISTER_NUM_LIMBS (8), W-suffix chips use RV64_WORD_NUM_LIMBS (4).

  • Guest program migration to no_std with openvm_intrinsics cfg and portable atomics is clean and thorough.

  • *_W adapter/core pattern remains well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension correctly.

  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces old target_os = "zkvm".


Summary

Priority Issue Impact Status
High #1: Riscv AOT Rv32*ExecutorRv64*Executor (4 files, 8 impls) Dead code + compilation failure under aot Trivial rename
High #2: Riscv AOT impls wrong const generic (4 files, 8 impls) Dead code under aot Part of AOT port
High #3: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files, 12 impls) Dead code + compilation failure under aot Trivial rename
Medium #4: AOT common/mod.rs helpers assume 32-bit layout AOT broken for RV64 Known deferral (CI disabled)
Low #5: TEMP[jpw] marker Cleanup Minor
Low #6: Serializer 8× over-allocation Wasted memory Minor
Low #7: Hardcoded 8 vs WORD_SIZE Style inconsistency Minor
Low #8: Rv32ModularArithmeticOpcode not renamed Cosmetic Minor
Low #9: Test error message strings say "Rv32IM" Cosmetic Minor

Progress Since Previous Reviews

Major progress — 6 of the 10 previously tracked bugs are fixed, including all critical/high-severity ones:

  • ✅ All critical soundness bugs (deferral AIR, deferral transpiler, pairing transpiler) — fixed
  • config.rv32i compilation issue — fixed
  • ✅ CUDA extension port — complete (previously flagged as "known deferral")
  • ✅ Duplicate guest/guest/ directory — removed
  • ✅ Deferral config naming — fixed

The remaining open items are all confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable).

Recommended Before Merge

  1. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors even though AOT CI is disabled
  2. Remove TEMP[jpw] marker / rename tmp_convert_to_u8s
  3. Track AOT port as a separate work item (the systemic common/mod.rs helpers + const generic issues need a coordinated fix)

# fix(v2.1-rv64): convert more rv32 references to rv64

Follow-up cleanup that gets the lint job and the `sha2_bench`
bench-prove run green on the rv64 branch, plus a few non-AOT polish
items called out in review.

## Lint fixes

- `extensions/riscv/circuit/src/auipc/tests.rs` — gate
`RV64_REGISTER_NUM_LIMBS` behind `#[cfg(feature = "cuda")]`; it's only
used inside the CUDA test, so the CPU lint job hit `unused_imports`.
- `crates/continuations/src/tests/e2e.rs` — rename stale
`Rv32DeferralBuilder` / `Rv32DeferralConfig` → `Rv64DeferralBuilder` /
`Rv64DeferralConfig` (3 sites). Fixes the CUDA lint job's `E0432`.
- `crates/sdk/src/tests.rs` — `SdkVmConfig::riscv32()` → `::riscv64()`
plus doc-comment.
- `benchmarks/prove/src/lib.rs` — drop unused `clap::command` import.
- `extensions/riscv/circuit/src/{mul,mul_w,jalr,mulh}/tests.rs` — error
string `"failed to create Rv32IM executor"` → `"Rv64IM executor"`.

## Rename

- `Rv32ModularArithmeticOpcode` → `Rv64ModularArithmeticOpcode` (81
references across `extensions/algebra/{transpiler,circuit}`). Pure
rename, no semantic change.

## Cleanup

- Remove `tmp_convert_to_u8s` and its `// TEMP[jpw]` marker from
`extensions/riscv/circuit/src/adapters/mod.rs` — `pub fn` with zero
in-tree callers.
- `crates/toolchain/openvm/src/io/mod.rs` — replace hardcoded `8` in
`read_vec_by_len` with `openvm_platform::WORD_SIZE`.

## Bug fix

- `crates/toolchain/openvm/src/serde/serializer.rs` — `to_vec` was
calling `Vec::<u64>::with_capacity(size_of_val(value))`, conflating
bytes with u64 elements and over-allocating 8×. Switched to
`size_of_val(value).div_ceil(size_of::<u64>())`. Heap waste only, no
correctness impact.

## Benchmark ELF rebuilds

Rebuilt 16 benchmark guest ELFs as 64-bit RISC-V (the committed binaries
were still 32-bit, which made `sha2_bench` fail with `Error: Not a
64-bit ELF` after the transpiler tightened its check): base64_json,
bincode, bubblesort, ecrecover, fibonacci, fibonacci_iterative,
fibonacci_recursive, keccak256, keccak256_iter, kitchen-sink, pairing,
quicksort, rkyv, sha2_bench, sha256, sha256_iter.

Not rebuilt — `regex`, `revm_transfer`, `revm_snailtracer` — their
dependency closures (`regex-automata`, `tracing-core 0.1.x`) use atomic
CAS methods that the `riscv64im-unknown-none-elf` target doesn't expose.
The existing `portable-atomic` polyfill in `openvm-platform` works for
crates that opt in via a feature flag (`bytes`, `spin`), but neither
failing crate exposes one. Tracking as a separate item.

## Verification

- `cargo +nightly fmt --all -- --check` clean.
- `cargo clippy --workspace --all-targets --tests --features "<CI
feature set>" -- -D warnings` clean.
- `cargo check --workspace --all-targets --tests` clean.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Claude finished @shuklaayush's task in 4m 42s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes in detail
  • Review remaining AOT issues
  • Post final review

Overview

Re-reviewing after the latest commit fix(v2.1-rv64): convert more rv32 references to rv64 (e9af982). The PR is at 45 commits, 100 changed files, +20982/−10779 lines. This commit addresses many of the previously reported issues.

Previously reported bugs now FIXED in this commit:

  • TEMP[jpw] marker / tmp_convert_to_u8sRemoved from adapters/mod.rs
  • ✅ Serializer 8× over-allocation — Fixed: now uses size_of_val(value).div_ceil(size_of::<u64>())
  • ✅ Hardcoded 8 in io/mod.rsFixed: now uses WORD_SIZE constant
  • Rv32ModularArithmeticOpcode enum — Renamed to Rv64ModularArithmeticOpcode across all algebra files
  • ✅ Test error message strings "Rv32IM" — Fixed to "Rv64IM" in 4 test files
  • ✅ SDK/continuations Rv32Deferral* references — Fixed to Rv64Deferral*
  • SdkVmConfig::riscv32()Fixed to riscv64() in SDK tests
  • commit_to_stdin_fields byte padding — Fixed from 4-byte to 8-byte flat_map

Previously confirmed fixed (earlier commits):

  • config.rv32iconfig.rv64i — all locations
  • ✅ CUDA extensions fully ported to Rv64* types
  • ✅ Deferral extension AIR/transpiler
  • ✅ Pairing transpiler register stride
  • ✅ Algebra transpiler/circuit
  • ✅ Duplicate guest/guest/ directory — removed
  • ✅ Hint stream length encoding
  • ✅ Range checker sizes
  • ✅ Guest toolchain target riscv64im-unknown-none-elf
  • abstract_compose overflow fix

Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 98, 186 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 112, 158 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

2. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot feature)

12 impl blocks across 6 files:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

3. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported. Neither the direct imports (line 14 imports MulWOpcode only) nor crate::common::* (line 19) provide MulOpcode. Will fail to compile under aot feature.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports block.

4. AOT impls for non-W chips use wrong const generic — dead code (under aot feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4):

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 21, 41

These impls implement traits for types that are never instantiated with RV32_REGISTER_NUM_LIMBS.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers (gpr_to_rv32_register, xmm_to_gpr in common/mod.rs:25-85), and RV32_REGISTER_AS/RV32_IMM_AS constants. Error messages in update_adapter_heights_asm also say "RV32_REGISTER_AS". The AOT layer needs a fundamental redesign for 64-bit register values since XMM elements are 32-bit (pinsrd/pextrd).

CI is properly disabled via TODO(aot): re-enable in .github/workflows/extension-tests.yml.


Architecture Observations (Positive)

  • Latest commit (e9af982) is a thorough cleanup that addresses many previously reported issues — serializer over-allocation, hardcoded constants, opcode enum naming, test strings, SDK/continuations deferral references. Good work.

  • Non-AOT code paths are clean. Zero RV32_REGISTER_NUM_LIMBS references remain outside of AOT impl blocks and the adapters/mod.rs import line (needed for the RV32_CELL_BITS == RV64_CELL_BITS assertion). All extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) use correct RV64_* constants.

  • CUDA port is complete. All chip types correctly reference Rv64* AIR types and use RV64_REGISTER_NUM_LIMBS/RV64_WORD_NUM_LIMBS consistently.

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE, io/mod.rs uses WORD_SIZE, guest openvm_intrinsics cfg flag is cleanly integrated, CLI init templates use u64/reveal_u64.

  • Test improvements in loadstore/tests.rs — the CUDA test now uses pointer_max_bits: 20 with properly sized address spaces, and gen_pointer is replaced with explicit address alignment logic that respects tester.address_bits().


Summary

Priority Issue Impact Status
High #1: Riscv AOT Rv32*ExecutorRv64*Executor (4 files, 8 impls) Compilation failure under aot Trivial rename
High #2: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files, 12 impls) Compilation failure under aot Trivial rename
Medium #3: Missing MulOpcode import in mul_w AOT Compilation failure under aot 1-line fix
Medium #4: Non-W AOT impls wrong const generic (4 files) Dead code under aot Part of AOT port
Medium #5: AOT layer assumes 32-bit register layout (systemic) Entire AOT path broken Known deferral (CI disabled)

Progress Since Previous Reviews

Excellent progress — all previously reported critical and high-severity bugs in the core execution paths have been fixed. The latest commit cleans up many of the remaining code quality issues. The only open items are confined to the AOT layer, which is explicitly marked as not yet ported with CI disabled.

Recommended Before Merge

  1. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
  2. Add missing MulOpcode import in mul_w/execution.rs — 1 line
  3. Track systemic AOT port as a separate work item

Resolves INT-6048.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Claude finished @shuklaayush's task in 6m 13s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (riscv-test-vectors migration)
  • Review remaining AOT issues
  • Review code quality and architecture
  • Post final review

Overview

Re-reviewing after the latest commit feat: migrate riscv-test-vectors from rv32 to rv64 (#2758) (8ab3d7e). The PR is at 46 commits, 662 files changed, +21000/−10797 lines.

Previously reported bugs now FIXED (confirmed):

  • ✅ Hint stream length encoding — uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • ✅ Phantom sub-executor naming — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • ✅ Commented-out hintstore — module is now active
  • abstract_compose overflow — uses from_u64(1u64 << ...)
  • ✅ Range checker sizes — properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • ✅ Guest toolchain target — updated to riscv64im-unknown-none-elf
  • config.rv32iconfig.rv64i — all 5 locations fixed
  • ✅ CUDA extensions — fully ported to Rv64* types across all extensions
  • ✅ Deferral extension AIR/transpiler — correctly uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register()
  • ✅ Pairing transpiler register stride — fixed
  • ✅ Algebra extension — all register operations use RV64_REGISTER_NUM_LIMBS
  • ✅ Duplicate guest/guest/ directory — removed
  • ✅ Deferral config naming — renamed to Rv64DeferralConfig/Rv64DeferralCpuBuilder
  • TEMP[jpw] marker — removed
  • ✅ Serializer over-allocation — now uses size_of_val(value).div_ceil(size_of::<u64>())
  • ✅ Hardcoded 8 in io/mod.rs — now uses WORD_SIZE constant
  • Rv32ModularArithmeticOpcode — renamed to Rv64ModularArithmeticOpcode
  • ✅ Test error message strings — updated to "Rv64IM"
  • ✅ SDK/continuations deferral references — updated to Rv64Deferral*

New Commit Assessment: riscv-test-vectors migration (#2758) — ✅ Clean

The migration from rv32im-test-vectors/ to riscv-test-vectors/ is correct:

  • Makefile correctly uses XLEN=64, includes rv64ui/rv64um Makefrag, compiles with -march=rv64g -mabi=lp64
  • riscv_test.h properly defines RVTEST_RV64U (not RV32U), uses .dword (8-byte) for tohost/fromhost
  • Linker script unchanged — uses generic riscv output arch
  • CI workflow correctly references riscv-test-vectors/ directory and test_rv64im_riscv_vector_* test names
  • Test code properly uses rv64ui-p-* skip list and riscv-test-vectors/tests path
  • Shipped binary rv64ui-p-simple is confirmed 64-bit ELF

Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 98, 186 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 112, 158 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

2. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot feature)

12 impl blocks across 6 files:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

3. AOT impls for non-W riscv chips use wrong const generic — dead code (under aot feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — making these impls dead code that never gets dispatched:

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 21, 41

Note: The *_W chips' AOT impls also use RV32_REGISTER_NUM_LIMBS (4), which happens to equal RV64_WORD_NUM_LIMBS (4) — functionally correct but semantically misleading.

4. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but only imports MulWOpcode (line 14). MulOpcode is not available via crate::common::* either.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The AOT x86 code generation layer is fundamentally designed for 32-bit registers:

  • Register stride a / 4 hardcoded across all AOT files
  • XMM helpers (gpr_to_rv32_register, xmm_to_gpr in common/mod.rs) assume 2 RISC-V registers per 64-bit XMM
  • Error messages say "RV32_REGISTER_AS" and "RV32_MEMORY_AS"

CI is disabled via TODO(aot): re-enable in .github/workflows/extension-tests.yml. The AOT layer needs a fundamental redesign for 64-bit register values.


Code Quality Issues

6. Transpiler tests use 32-bit ELFs that will be rejected by the RV64 ELF loader

crates/toolchain/tests/tests/transpiler_tests.rs has 6 non-#[ignore] tests that load rv32im-* ELF files (confirmed 32-bit via file command), but the ELF loader at elf.rs:81 rejects any non-ELF64 input with "Not a 64-bit ELF":

Test function ELF file Feature gate
test_decode_elf rv32im-empty-program-elf None
test_generate_program rv32im-fib-from-as, rv32im-intrin-from-as None
test_rv32im_runtime rv32im-exp-from-as, rv32im-fib-from-as None
test_intrinsic_runtime rv32im-intrin-from-as None
test_terminate_prove rv32im-terminate-from-as None
test_rv32im_aot_pure_runtime rv32im-exp-from-as aot

These tests will fail at runtime if executed. They're not run in CI (the riscv workflow only runs the rv64im_riscv_vector_* tests), but they'll fail for anyone running cargo nextest run -p openvm-toolchain-tests. They should either be updated with equivalent rv64 ELFs, #[ignore]d with a TODO(rv64) comment, or removed.

7. Rv32ModularFp2Int256Config struct not renamed

crates/toolchain/tests/tests/transpiler_tests.rs:137 defines pub struct Rv32ModularFp2Int256Config with Rv64I, Rv64M, Rv64Io fields. Cosmetic but inconsistent with the rest of the migration.

8. CUDA library name still says rv32im

extensions/riscv/circuit/build.rs:19.library_name("tracegen_gpu_rv32im"). Functionally correct (it's just an internal linker name) but inconsistent with the rv32imriscv rename.


Architecture Observations (Positive)

  • Non-AOT code paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage. Zero non-AOT RV32_REGISTER_NUM_LIMBS references outside of the constants re-export.

  • CUDA port is comprehensive and correct. All chip types reference Rv64* AIR types. The primitives/constants.h header correctly defines RV64_REGISTER_NUM_LIMBS = 8, RV64_WORD_NUM_LIMBS = 4, DEFAULT_BLOCK_SIZE = 8.

  • riscv-test-vectors migration is clean — Makefile targets rv64g, test header uses RVTEST_RV64U, shipped binary is confirmed 64-bit ELF, CI workflow correctly references new paths and test names.

  • Guest program migration is thorough — cfg(openvm_intrinsics) cleanly replaces cfg(target_os = "zkvm"), all benchmarks use no_std, guest lib functions use rv64 prefix.

  • ELF loader correctly validates Class::ELF64, uses try_into() for u64→u32 conversions with error handling, and constrains addresses to u32.

  • Decoder crate properly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces target_os = "zkvm".


Summary

Priority Issue Impact Status
High #1: Riscv AOT Rv32*ExecutorRv64*Executor (4 files, 8 impls) Compilation failure under aot Trivial rename
High #2: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files, 12 impls) Compilation failure under aot Trivial rename
Medium #3: Non-W AOT impls wrong const generic (4 files) Dead code under aot Part of AOT port
Medium #4: Missing MulOpcode import in mul_w AOT Compilation failure under aot 1-line fix
Medium #5: AOT layer assumes 32-bit registers (systemic) Entire AOT path broken Known deferral (CI disabled)
Low #6: Transpiler tests use 32-bit ELFs Tests broken if run locally Should #[ignore] or remove
Low #7: Rv32ModularFp2Int256Config not renamed Cosmetic Minor
Low #8: CUDA library name tracegen_gpu_rv32im Cosmetic Minor

Progress Since Previous Reviews

Excellent progress — all previously reported critical and high-severity bugs in the core execution/circuit/CUDA paths have been fixed. 19 of the originally reported issues are now resolved. The only remaining open items are confined to the AOT layer, which is explicitly marked as not yet ported with CI disabled.

Recommended Before Merge

  1. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
  2. Add missing MulOpcode import in mul_w/execution.rs — 1 line
  3. #[ignore] or remove the 6 transpiler tests that load 32-bit ELFs
  4. Track systemic AOT redesign as a separate work item

Towards INT-6339.

AUIPC was zero-extending `pc + (imm << 12)` into the upper 32 bits of
rd, but the RV64 spec requires sign extension, causing riscv test
`rv64ui-p-auipc` to fail. Add `is_sign_extend` column to constrain the
MSB of the 32-bit result and fill the upper 4 bytes accordingly.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ayush Shukla <ayush@axiom.xyz>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Claude finished @shuklaayush's task in 4m 57s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes in detail (AUIPC sign-extend fix, riscv-test-vectors migration)
  • Review remaining AOT issues
  • Review code quality and architecture
  • Post final review

Overview

Re-reviewing after the latest commit fix: sign-extend AUIPC result (#2759) (a50c479). The PR is at 47 commits, 100 changed files, +21105/−10807 lines.

Previously reported bugs now FIXED (confirmed):

  • ✅ Hint stream length encoding — uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • ✅ Phantom sub-executor naming — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • ✅ Commented-out hintstore — module is now active
  • abstract_compose overflow — uses from_u64(1u64 << ...)
  • ✅ Range checker sizes — properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • ✅ Guest toolchain target — updated to riscv64im-unknown-none-elf
  • config.rv32iconfig.rv64i — all locations fixed
  • ✅ CUDA extensions — fully ported to Rv64* types across all extensions
  • ✅ Deferral extension AIR/transpiler — correctly uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register()
  • ✅ Pairing transpiler register stride — fixed
  • ✅ Algebra extension — all register operations use RV64_REGISTER_NUM_LIMBS
  • ✅ Duplicate guest/guest/ directory — removed
  • ✅ Deferral config naming — renamed to Rv64DeferralConfig/Rv64DeferralCpuBuilder
  • TEMP[jpw] marker — removed
  • ✅ Serializer over-allocation — now uses size_of_val(value).div_ceil(size_of::<u64>())
  • ✅ Hardcoded 8 in io/mod.rs — now uses WORD_SIZE constant
  • Rv32ModularArithmeticOpcode — renamed to Rv64ModularArithmeticOpcode
  • ✅ Test error message strings — updated to "Rv64IM"
  • ✅ SDK/continuations deferral references — updated to Rv64Deferral*

New Commit Assessment: AUIPC sign-extend (#2759) — ✅ Sound

The fix correctly sign-extends the AUIPC result from 32 bits to 64 bits. In RV64, AUIPC computes rd = pc + (imm << 12) as a 32-bit operation, and the result must be sign-extended to 64 bits (upper 32 bits = 0xFF×4 if bit 31 is set, 0x00×4 otherwise).

AIR constraint approach is elegant and sound:

  • New is_sign_extend boolean column (line 37) tracks the sign bit of rd_data[3]
  • Range check (rd_data[2], 2*rd_data[3] - is_sign_extend * 2^8) forces is_sign_extend = msb(rd_data[3]):
    • If is_sign_extend = 0: 2*rd_data[3] must be in [0, 256), so rd_data[3] ∈ [0, 128) — top bit 0 ✓
    • If is_sign_extend = 1: 2*rd_data[3] - 256 must be in [0, 256), so rd_data[3] ∈ [128, 256) — top bit 1 ✓
  • Upper register bytes written as is_sign_extend * 0xFF (line 184-191), correctly producing either 0x00 or 0xFF

CUDA consistency verified — auipc.cu mirrors the exact same range check pattern (lines 44-49) and writes is_sign_extend to the trace (line 54).

Test coverage is thorough — new sign_extend_flag_negative_tests verify that pranking is_sign_extend to the wrong value fails verification for both positive and negative results. Sanity test updated to expect [210, 107, 113, 186, 255, 255, 255, 255] (sign-extended).

JalLui already had the same pattern applied (confirmed in jal_lui/core.rs:80,97-108,131-138) — both LUI and JAL correctly sign-extend their results. New JalLui negative tests also added.

JALR correctly uses zero-extension (expand_to_rv64_register) since rd = pc + 4 where pc < 2^30, meaning bit 31 is never set.


Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*. Dead code that will fail to compile under aot:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 98, 186 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 112, 158 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

2. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot feature)

12 impl blocks across 6 files:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

3. AOT impls for non-W riscv chips use wrong const generic — dead code (under aot feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — dead code that never gets dispatched:

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 21, 41

4. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported anywhere in the file. Neither the direct imports (line 14 imports only MulWOpcode) nor crate::common::* (line 19) provide MulOpcode.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports block.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers (gpr_to_rv32_register, xmm_to_gpr in common/mod.rs:25-85), and RV32_REGISTER_AS/RV32_IMM_AS constants. CI is disabled via TODO(aot): re-enable.


Code Quality Issues

6. Transpiler tests use 32-bit ELFs that will be rejected by the RV64 ELF loader

crates/toolchain/tests/tests/transpiler_tests.rs has 5 non-feature-gated tests that load rv32im-* ELF files (32-bit), but the ELF loader at elf.rs:81 rejects any non-ELF64 input with "Not a 64-bit ELF":

Test ELF
test_decode_elf rv32im-empty-program-elf
test_generate_program rv32im-fib-from-as, rv32im-intrin-from-as
test_rv32im_runtime rv32im-exp-from-as, rv32im-fib-from-as
test_intrinsic_runtime rv32im-intrin-from-as
test_terminate_prove rv32im-terminate-from-as

These won't fail in CI (which only runs test_rv64im_riscv_vector_*), but will fail for anyone running cargo nextest run -p openvm-toolchain-tests. They should either be updated with RV64 equivalents, #[ignore]d with a TODO(rv64) comment, or removed.

7. CUDA library name still says rv32im

extensions/riscv/circuit/build.rs:19.library_name("tracegen_gpu_rv32im"). Functionally correct (internal linker name) but inconsistent with the rv32imriscv rename.


Architecture Observations (Positive)

  • AUIPC sign-extend fix is well-designed. The range-check-based approach to constrain is_sign_extend = msb(rd_data[3]) is efficient (no extra columns beyond the boolean flag) and sound. Both Rust and CUDA trace generation are consistent.

  • JalLui sign-extend uses the same pattern with an additional rd[3] * (4 * is_jal + is_lui) range check that bounds the JAL result to pc < 2^30 — clean and correct.

  • Non-AOT code paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage. Zero non-AOT RV32_REGISTER_NUM_LIMBS references outside of the adapters/mod.rs assertion.

  • CUDA port is comprehensive and correct. All chip types reference Rv64* AIR types with proper RV64_REGISTER_NUM_LIMBS/RV64_WORD_NUM_LIMBS usage. The AUIPC CUDA trace generation correctly mirrors the Rust sign-extend logic.

  • *_W adapter/core pattern remains well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).

  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg cleanly replaces target_os = "zkvm", ELF64 validation, riscv64im-unknown-none-elf target.


Summary

Priority Issue Impact Status
High #1: Riscv AOT Rv32*ExecutorRv64*Executor (4 files, 8 impls) Compilation failure under aot Trivial rename
High #2: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files, 12 impls) Compilation failure under aot Trivial rename
Medium #3: Non-W AOT impls wrong const generic (4 files) Dead code under aot Part of AOT port
Medium #4: Missing MulOpcode import in mul_w AOT Compilation failure under aot 1-line fix
Medium #5: AOT layer assumes 32-bit registers (systemic) Entire AOT path broken Known deferral (CI disabled)
Low #6: Transpiler tests use 32-bit ELFs Tests broken if run locally Should #[ignore] or update
Low #7: CUDA library name tracegen_gpu_rv32im Cosmetic Minor

Progress Since Previous Reviews

All previously reported critical and high-severity bugs in the core execution/circuit/CUDA paths have been fixed (19 issues resolved). The latest AUIPC sign-extend fix is sound and well-tested. The only remaining open items are confined to the AOT layer, which is acknowledged as not yet ported with CI disabled.

Recommended Before Merge

  1. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
  2. Add missing MulOpcode import in mul_w/execution.rs — 1 line
  3. #[ignore] or remove the 5 transpiler tests that load 32-bit ELFs — prevents confusion for local developers
  4. Track systemic AOT port as a separate work item

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Claude finished @shuklaayush's task in 4m 27s —— View job


Code Review: RV64 Support PR (Re-review after latest sync)

  • Gather context — check new commits since last review
  • Review status of previously reported bugs
  • Review new changes (custom target.json with atomic CAS)
  • Review remaining AOT issues
  • Review code quality and architecture
  • Post final review

Overview

Re-reviewing after the latest commit feat(v2.1-rv64): use a custom target.json with atomic cas (#2760) (a6bc526). The PR is at 48 commits, 100 changed files, +21070/−10797 lines.

Previously reported bugs now FIXED (confirmed):

  • ✅ Hint stream length encoding — uses u64::to_le_bytes() and HINT_DWORD_BYTES alignment
  • ✅ Phantom sub-executor naming — renamed to Rv64HintInputSubEx, Rv64HintRandomSubEx, Rv64PrintStrSubEx
  • ✅ Commented-out hintstore — module is now active
  • abstract_compose overflow — uses from_u64(1u64 << ...)
  • ✅ Range checker sizes — properly uses RV64_CELL_BITS and RV64_REGISTER_NUM_LIMBS
  • ✅ Guest toolchain target — updated to custom riscv64im-openvm-none-elf
  • config.rv32iconfig.rv64i — all locations fixed
  • ✅ CUDA extensions — fully ported to Rv64* types across all extensions
  • ✅ Deferral extension AIR/transpiler — correctly uses RV64_WORD_NUM_LIMBS + expand_to_rv64_register()
  • ✅ Pairing transpiler register stride — fixed
  • ✅ Algebra extension — all register operations use RV64_REGISTER_NUM_LIMBS
  • ✅ Duplicate guest/guest/ directory — removed
  • ✅ Deferral config naming — renamed to Rv64DeferralConfig/Rv64DeferralCpuBuilder
  • TEMP[jpw] marker — removed
  • ✅ Serializer over-allocation — fixed with size_of_val(value).div_ceil(size_of::<u64>())
  • ✅ Hardcoded 8 in io/mod.rs — now uses WORD_SIZE constant
  • Rv32ModularArithmeticOpcode — renamed to Rv64ModularArithmeticOpcode
  • ✅ Test error message strings — updated to "Rv64IM"
  • ✅ SDK/continuations deferral references — updated to Rv64Deferral*
  • ✅ AUIPC sign-extend — correctly sign-extends 32-bit result to 64-bit
  • ✅ riscv-test-vectors migrated from rv32 to rv64

New Commit Assessment: Custom Target JSON with Atomic CAS (#2760) — ✅ Well Done

This commit replaces the stock riscv64im-unknown-none-elf target with a custom riscv64im-openvm-none-elf that declares "atomic-cas": true and "max-atomic-width": 64. Since the guest is single-threaded and the -C passes=lower-atomic LLVM pass converts all atomic ops to non-atomic equivalents at compile time, this is sound.

Benefits are significant:

  • Eliminates the portable-atomic dependency and its critical-section feature from the guest dependency graph
  • Removes per-crate bytes/extra-platforms and spin/portable_atomic workarounds from 3 benchmark guests + 1 example
  • Simplifies once_cell usage in algebra/ecc guests (no longer needs portable-atomic feature)
  • critical-section is now properly optional, only pulled when heap-embedded-alloc is active
  • The no-op critical_section impl is co-located with embedded-alloc in heap/embedded.rs instead of being a standalone module — cleaner ownership

Target JSON spec is correct:

  • "arch": "riscv64" + "features": "+m" — matches rv64im
  • "singlethread": true — enables LLVM single-thread optimizations
  • "panic-strategy": "abort" — matches the -C panic=abort rustc flag
  • "os": "none" — bare metal target
  • "llvm-abiname": "lp64" — correct for rv64im without F/D extensions
  • "crt-objects-fallback": "false" — correct for no_std with -Z build-std

Build integration is correct:

  • RUST_TARGET_PATH set to CARGO_MANIFEST_DIR (where the JSON lives)
  • -Z unstable-options added to rustc flags (required for custom JSON targets)
  • RUSTC_TARGET constant updated from riscv64im-unknown-none-elf to riscv64im-openvm-none-elf
  • Docs updated to reflect new target path

One minor observation: The cfg guard in platform/Cargo.toml:16 uses cfg(all(target_arch = "riscv64", target_os = "none")) which matches the custom target (arch=riscv64, os=none). This is correct — the cfg attributes don't depend on the target triple name, only on the resolved target properties.


Bugs (Still Open — All AOT-Only)

All remaining bugs are confined to the AOT layer, which is acknowledged as not yet ported (CI disabled with TODO(aot): re-enable in .github/workflows/extension-tests.yml).

1. AOT impls reference non-existent Rv32*Executor types (Compilation error under aot feature)

8 impl blocks across 4 files implement AotExecutor/AotMeteredExecutor for struct names that were renamed to Rv64*. Dead code that will fail to compile under aot:

File Lines Has Should be
riscv/circuit/src/jal_lui/execution.rs 93, 184 Rv32JalLuiExecutor Rv64JalLuiExecutor
riscv/circuit/src/auipc/execution.rs 84, 167 Rv32AuipcExecutor Rv64AuipcExecutor
riscv/circuit/src/jalr/execution.rs 98, 186 Rv32JalrExecutor Rv64JalrExecutor
riscv/circuit/src/hintstore/execution.rs 112, 158 Rv32HintStoreExecutor Rv64HintStoreExecutor

Fix this →

2. Bigint AOT impls reference non-existent Rv32*256Executor types (Compilation error under aot feature)

12 impl blocks across 6 files:

File Lines Has Should be
bigint/circuit/src/base_alu.rs 88, 132 Rv32BaseAlu256Executor Rv64BaseAlu256Executor
bigint/circuit/src/branch_eq.rs 81, 124 Rv32BranchEqual256Executor Rv64BranchEqual256Executor
bigint/circuit/src/branch_lt.rs 86, 129 Rv32BranchLessThan256Executor Rv64BranchLessThan256Executor
bigint/circuit/src/less_than.rs 84, 127 Rv32LessThan256Executor Rv64LessThan256Executor
bigint/circuit/src/mult.rs 72, 115 Rv32Multiplication256Executor Rv64Multiplication256Executor
bigint/circuit/src/shift.rs 85, 128 Rv32Shift256Executor Rv64Shift256Executor

Fix this →

3. AOT impls for non-W riscv chips use wrong const generic — dead code (under aot feature)

The actual executor types use RV64_REGISTER_NUM_LIMBS (8), but AOT impl blocks use RV32_REGISTER_NUM_LIMBS (4) — dead code that never gets dispatched:

File Lines
base_alu/execution.rs 177, 245
less_than/execution.rs 128, 247
divrem/execution.rs 108, 273
load_sign_extend/aot.rs 21, 41

4. Missing MulOpcode import in mul_w/execution.rs (Compilation error under aot feature)

mul_w/execution.rs:98 uses MulOpcode::MUL.global_opcode() but MulOpcode is not imported anywhere in the file. Will fail to compile under aot feature.

Fix: Add use openvm_riscv_transpiler::MulOpcode; to the #[cfg(feature = "aot")] imports block.

5. AOT layer assumes 32-bit register layout throughout (Systemic — known deferral)

The entire AOT x86 code generation layer uses 4-limb register stride (a / 4), 32-bit XMM helpers (gpr_to_rv32_register, xmm_to_gpr), and RV32_REGISTER_AS/RV32_IMM_AS constants. CI is disabled via TODO(aot): re-enable. Needs fundamental redesign for 64-bit register values.


Code Quality Issues

6. Transpiler tests use 32-bit ELFs that will be rejected by the RV64 ELF loader

crates/toolchain/tests/tests/transpiler_tests.rs has 5 non-feature-gated tests that load rv32im-* ELF files (32-bit), but the ELF loader rejects non-ELF64 with "Not a 64-bit ELF":

Test ELF
test_decode_elf rv32im-empty-program-elf
test_generate_program rv32im-fib-from-as, rv32im-intrin-from-as
test_rv32im_runtime rv32im-exp-from-as, rv32im-fib-from-as
test_intrinsic_runtime rv32im-intrin-from-as
test_terminate_prove rv32im-terminate-from-as

These won't fail in CI (which only runs test_rv64im_riscv_vector_*), but will fail for local developers running cargo nextest run -p openvm-toolchain-tests. They should be #[ignore]d with a TODO(rv64) comment or updated with RV64 equivalents.

7. CUDA library name still says rv32im

extensions/riscv/circuit/build.rs:19.library_name("tracegen_gpu_rv32im"). Functionally correct (internal linker name) but inconsistent with the rv32imriscv rename.


Architecture Observations (Positive)

  • Custom target JSON approach is elegant — declaring "atomic-cas": true at the target level combined with lower-atomic at the LLVM pass level eliminates the need for portable-atomic across the entire dependency graph. This is the cleanest solution for single-threaded no-std targets needing CAS-dependent crates.

  • Non-AOT code paths across ALL extensions (riscv, algebra, ECC, pairing, keccak256, sha2, bigint, deferral) are solid — consistent RV64_* constant usage. Zero non-AOT RV32_REGISTER_NUM_LIMBS references outside of the adapters re-export.

  • CUDA port is comprehensive and correct. All chip types reference Rv64* AIR types with proper RV64_REGISTER_NUM_LIMBS/RV64_WORD_NUM_LIMBS usage. Zero Rv32* references in CUDA code.

  • Critical section impl correctly co-located with embedded-alloc in heap/embedded.rs — only compiled when the heap feature is active, eliminating unnecessary compilation of critical-section for non-heap guest programs.

  • *_W adapter/core pattern is well-designed — adapters read full 8-byte registers, expose lower 4 bytes to cores, handle sign extension via (result as i32 as i64 as u64).

  • Decoder crate correctly handles RV64 shift amounts (6-bit shamt via funct6) and W-suffix validation (shamt < 32).

  • Toolchain migration is thorough — serializer/deserializer use WORD_SIZE=8, platform uses 8-byte alignment, openvm_intrinsics cfg flag cleanly replaces target_os = "zkvm", riscv64im-openvm-none-elf custom target with proper atomic lowering.

  • AUIPC/JalLui sign-extend is sound — range-check-based constraint on is_sign_extend boolean, with matching CUDA trace generation.


Summary

Priority Issue Impact Status
High #1: Riscv AOT Rv32*ExecutorRv64*Executor (4 files, 8 impls) Compilation failure under aot Trivial rename
High #2: Bigint AOT Rv32*256ExecutorRv64*256Executor (6 files, 12 impls) Compilation failure under aot Trivial rename
Medium #3: Non-W AOT impls wrong const generic (4 files) Dead code under aot Part of AOT port
Medium #4: Missing MulOpcode import in mul_w AOT Compilation failure under aot 1-line fix
Medium #5: AOT layer assumes 32-bit registers (systemic) Entire AOT path broken Known deferral (CI disabled)
Low #6: Transpiler tests use 32-bit ELFs Broken if run locally Should #[ignore] or update
Low #7: CUDA library name tracegen_gpu_rv32im Cosmetic Minor

Progress Since Previous Reviews

All previously reported critical and high-severity bugs in the core execution/circuit/CUDA paths have been fixed (19+ issues resolved across the review cycle). The latest commit cleanly eliminates portable-atomic complexity via a well-designed custom target JSON. The only remaining open items are confined to the AOT layer, which is explicitly acknowledged as not yet ported with CI disabled.

Recommended Before Merge

  1. Fix Rv32*ExecutorRv64*Executor in riscv AOT (4 files, 8 impls) and bigint AOT (6 files, 12 impls) — trivial renames that prevent compilation errors if someone enables the aot feature
  2. Add missing MulOpcode import in mul_w/execution.rs — 1 line
  3. #[ignore] or remove the 5 transpiler tests that load 32-bit ELFs — prevents confusion for local developers
  4. Track systemic AOT port as a separate work item

@github-actions
Copy link
Copy Markdown

group app.proof_time_ms app.cycles leaf.proof_time_ms
fibonacci 1,874 4,000,051 532
keccak 13,734 14,365,133 2,259
sha2_bench 9,356 11,167,961 1,246
regex 1,580 4,090,656 375
ecrecover 648 112,210 293
pairing 762 592,827 282
kitchen_sink 2,084 1,979,971 431

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: a6bc526

Benchmark Workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants