Skip to content

feat: rv64 cuda hintstore#2748

Open
876pol wants to merge 24 commits intodevelop-v2.1.0-rv64from
rv64-cuda-hintstore
Open

feat: rv64 cuda hintstore#2748
876pol wants to merge 24 commits intodevelop-v2.1.0-rv64from
rv64-cuda-hintstore

Conversation

@876pol
Copy link
Copy Markdown
Contributor

@876pol 876pol commented Apr 23, 2026

Resolves INT-7554.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@876pol 876pol requested review from shuklaayush April 23, 2026 21:26
@876pol 876pol marked this pull request as ready for review April 23, 2026 21:26
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link
Copy Markdown
Collaborator

@shuklaayush shuklaayush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with some comments

Comment thread extensions/riscv/circuit/cuda/src/hintstore.cu
Comment thread extensions/riscv/circuit/cuda/src/hintstore.cu
uint32_t *__restrict__ d_range_checker,
uint32_t range_checker_num_bins,
uint32_t *__restrict__ d_bitwise_lookup,
uint32_t bitwise_num_bits,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated but can you do a cleanup in a follow-up pr to remove this argument and use a constant instead? we're passing RV64_CELL_BITS as u32 as bitwise_num_bits from rust and it doesn't make sense to use a runtime value here. also do a sweep for other kernels so we don't pass constants as arguments

Comment thread extensions/riscv/circuit/cuda/src/hintstore.cu
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Claude finished @876pol's task in 1m 24s —— View job


Code Review: rv64 cuda hintstore

  • Gather context and read PR diff
  • Review changed files in detail
  • Post review feedback

This PR ports the CUDA hintstore kernel from RV32 to RV64, updating struct names, constants, and field sizes to match the Rust-side Rv64HintStore* types. I verified the CUDA struct layouts and arithmetic against the Rust AIR constraints and fill_trace_row.

Overall: Looks good. The RV32→RV64 rename is consistent and the key arithmetic matches the Rust side.

Verified correctness

  1. REM_WORDS_NUM_LIMBS = 2 (hintstore.cu:18): Matches Rust-side const REM_WORDS_NUM_LIMBS: usize = 2 at mod.rs:57. The rem_words_limbs array correctly shrinks from 4 (RV32) to 2 limbs since rem_words fits in 10 bits.

  2. Rv64HintStoreCols struct layout: Field types and sizes match the Rust Rv64HintStoreCols<T> at mod.rs:61-79:

    • rem_words_limbs[REM_WORDS_NUM_LIMBS][T; REM_WORDS_NUM_LIMBS]
    • mem_ptr_limbs[RV64_WORD_NUM_LIMBS] (4) → [T; RV64_WORD_NUM_LIMBS]
    • data[RV64_REGISTER_NUM_LIMBS] (8) → [T; RV64_REGISTER_NUM_LIMBS]
    • write_aux templated on RV64_REGISTER_NUM_LIMBS (8)
  3. mem_ptr advancement (hintstore.cu:88): local_idx * RV64_REGISTER_NUM_LIMBS (8) matches Rust filler at mod.rs:605.

  4. Range check arithmetic (hintstore.cu:110-121): All shift factors match the Rust AIR (mod.rs:248-252) and CPU filler (mod.rs:565-601):

    • msl_rshift = (4-1) * 8 = 24
    • msl_lshift = 4 * 8 - pointer_max_bits
    • rem_words_msl_lshift = 2 * 8 - 10 = 6
    • num_words >> (8 * 1) & 0xFF correctly extracts rem_words_limbs[1]
  5. static_assert (hintstore.cu:13-16): Good upgrade from the CUDA_DEBUG-gated runtime assert to a compile-time static_assert. This is exactly what the reviewer suggested.

  6. Data byte range checks (hintstore.cu:154-157): Loop over RV64_REGISTER_NUM_LIMBS (8) limbs in pairs, matching the Rust AIR at mod.rs:258-262.

Minor notes

  • Host-side asserts (hintstore.cu:219-220): I agree with shuklaayush's comment — height being a power of two and >= rows_used are invariants guaranteed by the Rust caller (next_power_of_two_or_zero at cuda.rs:64). They're cheap host-side checks though, and consistent with other kernels, so not a blocker.

  • bitwise_num_bits parameter (hintstore.cu:215): Per shuklaayush's comment, this is always RV64_CELL_BITS (8) and could be a constant instead of a runtime argument. Agreed that's worth a follow-up cleanup PR across all kernels.

No correctness issues found.

@876pol 876pol force-pushed the rv64-cuda-easy-instructions branch from ff9047b to e24c55e Compare April 24, 2026 20:37
@876pol 876pol force-pushed the rv64-cuda-easy-instructions branch from e24c55e to ff9047b Compare April 24, 2026 20:43
Base automatically changed from rv64-cuda-easy-instructions to develop-v2.1.0-rv64 April 24, 2026 20:53
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Claude finished @876pol's task in 2m 3s —— View job


Code Review: rv64 cuda hintstore

  • Gather context and read PR diff
  • Review changed files in detail
  • Post review feedback

This PR ports the CUDA hintstore kernel from RV32 to RV64 and fixes several stale references across the riscv circuit crate. I verified the CUDA struct layouts, arithmetic constants, and range-check logic against the Rust-side AIR constraints and fill_trace_row.

Overall: Looks good. The RV32→RV64 rename is consistent, constants are correct, and the key arithmetic matches the Rust side.

Verified correctness

  1. REM_WORDS_NUM_LIMBS = 2 (hintstore.cu:18) — Matches Rust const REM_WORDS_NUM_LIMBS: usize = 2 at mod.rs:57. The rem_words_limbs array correctly shrinks from 4 (RV32) to 2 since rem_words fits in 10 bits (MAX_HINT_BUFFER_DWORDS_BITS).

  2. Rv64HintStoreCols struct layout — Field types and sizes match the Rust Rv64HintStoreCols<T> at mod.rs:61-85:

    • rem_words_limbs[2][T; REM_WORDS_NUM_LIMBS]
    • mem_ptr_limbs[4][T; RV64_WORD_NUM_LIMBS]
    • data[8][T; RV64_REGISTER_NUM_LIMBS]
    • write_aux templated on RV64_REGISTER_NUM_LIMBS (8) ✓
  3. mem_ptr advancement (hintstore.cu:88) — local_idx * RV64_REGISTER_NUM_LIMBS (8) matches Rust filler at mod.rs:593,646. ✓

  4. Range check arithmetic (hintstore.cu:110-121) — All shift factors match the Rust AIR (mod.rs:223-241) and CPU filler (mod.rs:547-589):

    • msl_rshift = (4-1) * 8 = 24
    • msl_lshift = 4 * 8 - pointer_max_bits
    • rem_words_msl_lshift = 2 * 8 - 10 = 6
    • num_words >> (8 * 1) & 0xFF correctly extracts rem_words_limbs[1]
  5. static_assert (hintstore.cu:13-16) — Good upgrade from CUDA_DEBUG-gated runtime assert to a compile-time static_assert. Clean improvement.

  6. Data byte range checks (hintstore.cu:155) — Loop over RV64_REGISTER_NUM_LIMBS (8) limbs in pairs, matching the Rust AIR at mod.rs:246-249. ✓

  7. rem_words_limbs encoding — CUDA uses reinterpret_cast<uint8_t*>(&rem_words) taking the low 2 bytes of a uint32_t. Rust filler casts to u16 then to_le_bytes(). Both yield the same 2 little-endian bytes on CUDA's little-endian architecture. ✓

  8. cuda.rs changesRV32_CELL_BITSRV64_CELL_BITS in BitwiseOperationLookupChipGPU type and the tracegen call. Both are 8 but this keeps types semantically correct. ✓

  9. Test changes (tests.rs) — All type references updated (Rv32Rv64), HINT_STOREWHINT_STORED, Rv32HintStoreLayoutRv64HintStoreLayout. Consistent with the rename. ✓

Style nit

  • #[cfg(all(feature = "cuda"))] and #[cfg(all(test))] in base_alu_w, divrem_w, mul_w, shift_wcfg(all(X)) is semantically identical to cfg(X) when there's only one predicate. The rest of the codebase uses the simpler #[cfg(feature = "cuda")] and #[cfg(test)] forms. This is purely cosmetic and doesn't affect compilation, but it's inconsistent with the convention in sibling modules (branch_lt, branch_eq, shift, mul, etc.). Consider reverting to the simpler form. Fix this →

Notes on existing review comments

  • Host-side asserts (hintstore.cu:219-220): Agree with shuklaayush that these are guaranteed by the Rust caller (next_power_of_two_or_zero at cuda.rs:64). They're cheap host-side checks though, and other kernels have them, so this is a stylistic call.

  • bitwise_num_bits parameter (hintstore.cu:215): Agree this is always RV64_CELL_BITS (8) and should be a constant. Good candidate for a follow-up cleanup PR.

No correctness issues found.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Claude finished @876pol's task in 1m 30s —— View job


Code Review: rv64 cuda hintstore

  • Gather context and read PR diff
  • Review changed files in detail
  • Post review feedback

This PR ports the CUDA hintstore kernel from RV32 to RV64, updating struct names, constants, and field sizes to match the Rust-side Rv64HintStore* types. I cross-referenced the CUDA struct layouts, arithmetic, and range-check logic against the Rust AIR constraints and fill_trace_row.

Overall: Looks correct. The RV32→RV64 rename is consistent and the key arithmetic matches the Rust side. No correctness issues found.

Verified correctness

  1. REM_WORDS_NUM_LIMBS = 2 (hintstore.cu:18): Matches Rust const REM_WORDS_NUM_LIMBS: usize = 2 at mod.rs:57. Correct because rem_words fits in 2^MAX_HINT_BUFFER_DWORDS_BITS = 2^10 which requires only 2 bytes.

  2. Rv64HintStoreCols struct layout (hintstore.cu:20-41): Field-by-field match with Rust Rv64HintStoreCols<T> at mod.rs:61-84:

    • rem_words_limbs[REM_WORDS_NUM_LIMBS] (2) ← was 4 in RV32 ✓
    • mem_ptr_limbs[RV64_WORD_NUM_LIMBS] (4) ✓
    • data[RV64_REGISTER_NUM_LIMBS] (8) ← was 4 in RV32 ✓
    • write_aux templated on RV64_REGISTER_NUM_LIMBS (8) ✓
  3. mem_ptr advancement (hintstore.cu:88): local_idx * RV64_REGISTER_NUM_LIMBS (8 bytes per dword) matches the Rust filler at mod.rs:646 (mem_ptr -= RV64_REGISTER_NUM_LIMBS as u32).

  4. Overflow guard (hintstore.cu:104): MAX_HINT_BUFFER_DWORDS_BITS + 3 < pointer_max_bits is correct — 8 * MAX_HINT_BUFFER_DWORDS needs 10 + 3 = 13 bits (was +2 for RV32's 4-byte words). ✓

  5. Range check arithmetic (hintstore.cu:110-121): All shift factors match the Rust AIR (mod.rs:236-241) and CPU filler (mod.rs:553-558):

    • msl_rshift = (4-1) * 8 = 24
    • msl_lshift = 4 * 8 - pointer_max_bits
    • rem_words_msl_lshift = 2 * 8 - 10 = 6
    • num_words >> (8 * 1) & 0xFF correctly extracts rem_words_limbs[1]
  6. static_assert (hintstore.cu:12-16): Good upgrade from CUDA_DEBUG-gated runtime assert to compile-time static_assert. This addresses shuklaayush's suggestion.

  7. rem_words_limbs writing (hintstore.cu:89,94): CUDA writes the first 2 bytes of a little-endian uint32_t via reinterpret_cast<uint8_t *>, equivalent to the Rust filler's (rem_words as u16).to_le_bytes() at mod.rs:653. ✓

  8. Data byte range checks (hintstore.cu:155-157): Loop over RV64_REGISTER_NUM_LIMBS (8) limbs in pairs, matching the Rust AIR at mod.rs:246-249. ✓

  9. Test updates (tests.rs): All type references correctly updated to RV64 variants. HINT_STOREWHINT_STORED reflects the opcode rename. ✓

  10. Disabled test modules (load_sign_extend/mod.rs, loadstore/mod.rs): Using #[cfg(all(test, any()))] with a TODO comment is a reasonable way to disable tests pending the RV64 port.

Minor notes (non-blocking)

  • Host-side asserts (hintstore.cu:219-220): Per shuklaayush's comment, height being a power of two and >= rows_used are invariants guaranteed by next_power_of_two_or_zero on the Rust side (cuda.rs:64). They're cheap host-side assertions and consistent with other kernels, so fine to keep for defensive coding.

  • bitwise_num_bits parameter (hintstore.cu:215): Agree with shuklaayush that this is always RV64_CELL_BITS and could be a constant. Good candidate for a follow-up cleanup PR across all kernels.

  • cuda.rs comment sync: The comment at constants.h:94 ("Must match MAX_HINT_BUFFER_DWORDS_BITS in openvm_riscv_guest::lib.rs") is correctly updated. The corresponding Rust-side comment at guest/src/lib.rs:30 ("Must be synced with MAX_HINT_BUFFER_DWORDS_BITS constant for cuda") is also updated. ✓

No correctness issues found. LGTM.


@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Claude finished @876pol's task in 1m 46s —— View job


Code Review: rv64 cuda hintstore

  • Gather context and read PR diff
  • Review changed files in detail
  • Post review feedback

This PR ports the CUDA hintstore kernel from RV32 to RV64, updating struct names, constants, and field sizes to match the Rust-side Rv64HintStore* types. I cross-referenced the CUDA struct layouts, arithmetic, and range-check logic against the Rust AIR constraints and fill_trace_row.

Overall: Looks correct. No issues found.

Verified correctness

  1. REM_WORDS_NUM_LIMBS = 2 (hintstore.cu:18): Matches Rust const REM_WORDS_NUM_LIMBS: usize = 2 at mod.rs:57. Correct because rem_words fits in 2^MAX_HINT_BUFFER_DWORDS_BITS = 2^10, requiring only 2 bytes.

  2. Rv64HintStoreCols struct layout (hintstore.cu:20-41): Field-by-field match with Rust Rv64HintStoreCols<T> at mod.rs:61-85:

    • rem_words_limbs[REM_WORDS_NUM_LIMBS] (2) ✓
    • mem_ptr_limbs[RV64_WORD_NUM_LIMBS] (4) ✓
    • data[RV64_REGISTER_NUM_LIMBS] (8) ✓
    • write_aux templated on RV64_REGISTER_NUM_LIMBS (8) ✓
  3. Record structs: Rv64HintStoreRecordHeader and Rv64HintStoreVars (hintstore.cu:44-63) match Rust-side Rv64HintStoreRecordHeader (mod.rs:306-319) and Rv64HintStoreVar (mod.rs:324-327) field-by-field. ✓

  4. mem_ptr advancement (hintstore.cu:88): local_idx * RV64_REGISTER_NUM_LIMBS (8 bytes per dword) matches the Rust filler at mod.rs:646. ✓

  5. Overflow guard (hintstore.cu:104): MAX_HINT_BUFFER_DWORDS_BITS + 3 < pointer_max_bits is correct — 8 * MAX_HINT_BUFFER_DWORDS needs 10 + 3 = 13 bits (was +2 for RV32's 4-byte words). ✓

  6. Range check arithmetic (hintstore.cu:110-121): All shift factors match the Rust AIR (mod.rs:234-243) and CPU filler (mod.rs:553-558):

    • msl_rshift = (4-1) * 8 = 24
    • msl_lshift = 4 * 8 - pointer_max_bits
    • rem_words_msl_lshift = 2 * 8 - 10 = 6
    • num_words >> (8 * 1) & 0xFF correctly extracts rem_words_limbs[1]
  7. static_assert (hintstore.cu:12-16): Good upgrade from CUDA_DEBUG-gated runtime assert to compile-time static_assert, addressing reviewer feedback. ✓

  8. rem_words_limbs encoding (hintstore.cu:89,94): CUDA writes the first 2 bytes of a little-endian uint32_t via reinterpret_cast<uint8_t *>. Rust filler casts to u16 then to_le_bytes() (mod.rs:653). Both yield the same 2 little-endian bytes. ✓

  9. Data byte range checks (hintstore.cu:155): Loop over RV64_REGISTER_NUM_LIMBS (8) limbs in pairs, matching the Rust AIR at mod.rs:246-249. ✓

  10. cuda.rs changes: RV32_CELL_BITSRV64_CELL_BITS in BitwiseOperationLookupChipGPU type and the tracegen call. Both constants are 8, but this keeps types semantically correct for the RV64 context. ✓

  11. Test updates (tests.rs): All type references correctly updated to RV64 variants. HINT_STOREWHINT_STORED reflects the opcode rename. ✓

  12. constants.h rename: MAX_HINT_BUFFER_WORDS_BITSMAX_HINT_BUFFER_DWORDS_BITS and MAX_HINT_BUFFER_WORDSMAX_HINT_BUFFER_DWORDS. Comment updated to match. ✓

  13. Guest lib comment (lib.rs:29): Updated to reference MAX_HINT_BUFFER_DWORDS_BITS. ✓

Notes on existing review comments

  • Host-side asserts (hintstore.cu:219-220): Agree with shuklaayush that height being a power of two and >= rows_used are guaranteed by next_power_of_two_or_zero on the Rust side (cuda.rs:64). They're cheap host-side checks and consistent with other kernels, so fine either way.

  • bitwise_num_bits parameter (hintstore.cu:215): Agree this is always RV64_CELL_BITS (8) and could be a constant. Good candidate for a follow-up cleanup PR.

LGTM — no correctness issues found.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Claude finished @876pol's task in 1m 31s —— View job


Code Review: rv64 cuda hintstore

  • Gather context and read PR diff
  • Review changed files in detail
  • Post review feedback

This PR ports the CUDA hintstore kernel from RV32 to RV64, updating struct names, constants, and field sizes to match the Rust-side Rv64HintStore* types. I cross-referenced the CUDA struct layouts, arithmetic, and range-check logic against the Rust AIR constraints (mod.rs:216-249) and fill_trace_row (mod.rs:553-658).

Overall: Looks correct. No issues found.

Verified correctness

  1. REM_WORDS_NUM_LIMBS = 2 (hintstore.cu:18): Matches Rust const REM_WORDS_NUM_LIMBS: usize = 2 at mod.rs:57. Correct because rem_words fits in 2^MAX_HINT_BUFFER_DWORDS_BITS = 2^10, requiring only 2 bytes.

  2. Rv64HintStoreCols struct layout (hintstore.cu:20-41): Field-by-field match with Rust Rv64HintStoreCols<T> at mod.rs:61-85:

    • rem_words_limbs[2] ← was 4 in RV32 ✓
    • mem_ptr_limbs[RV64_WORD_NUM_LIMBS] (4) ✓
    • data[RV64_REGISTER_NUM_LIMBS] (8) ← was 4 in RV32 ✓
    • write_aux templated on RV64_REGISTER_NUM_LIMBS (8) ✓
  3. Record structs (hintstore.cu:44-63): Rv64HintStoreRecordHeader and Rv64HintStoreVars match the Rust-side Rv64HintStoreRecordHeader (mod.rs:306-319) and Rv64HintStoreVar (mod.rs:322-327) in field order and types. Field name difference (write_aux vs data_write_aux) is irrelevant since both are #[repr(C)] and layout is determined by order. ✓

  4. mem_ptr advancement (hintstore.cu:88): local_idx * RV64_REGISTER_NUM_LIMBS (8 bytes per dword) matches the Rust filler at mod.rs:646 (mem_ptr -= RV64_REGISTER_NUM_LIMBS as u32). ✓

  5. Overflow guard (hintstore.cu:104): MAX_HINT_BUFFER_DWORDS_BITS + 3 < pointer_max_bits is correct — 8 * MAX_HINT_BUFFER_DWORDS needs 10 + 3 = 13 bits (was +2 for RV32's 4-byte words). ✓

  6. Range check arithmetic (hintstore.cu:110-121): All shift factors match the Rust AIR (mod.rs:234-242) and CPU filler (mod.rs:553-558):

    • msl_rshift = (4-1) * 8 = 24
    • msl_lshift = 4 * 8 - pointer_max_bits
    • rem_words_msl_lshift = 2 * 8 - 10 = 6
    • num_words >> (8 * 1) & 0xFF correctly extracts rem_words_limbs[1]
  7. static_assert (hintstore.cu:12-16): Good upgrade from CUDA_DEBUG-gated runtime assert to compile-time static_assert, addressing shuklaayush's suggestion. ✓

  8. rem_words_limbs encoding (hintstore.cu:89,94): CUDA writes the first 2 bytes of a little-endian uint32_t via reinterpret_cast<uint8_t *>. Rust filler casts to u16 then to_le_bytes() (mod.rs:653). Both yield the same 2 little-endian bytes on CUDA's little-endian architecture. ✓

  9. Data byte range checks (hintstore.cu:155): Loop over RV64_REGISTER_NUM_LIMBS (8) limbs in pairs, matching the Rust AIR at mod.rs:246-249. ✓

  10. cuda.rs changes: RV32_CELL_BITSRV64_CELL_BITS in BitwiseOperationLookupChipGPU type and the tracegen call. Both constants equal 8, but this keeps types semantically correct for the RV64 context. ✓

  11. Test updates (tests.rs): All type references correctly updated. HINT_STOREWHINT_STORED reflects the opcode rename. Tests are now re-enabled (#[cfg(test)] replaces #[cfg(all(test, any()))]). ✓

  12. constants.h (constants.h:93-97): MAX_HINT_BUFFER_WORDS_BITSMAX_HINT_BUFFER_DWORDS_BITS with matching comment update. ✓

  13. cfg(all(...)) cleanup: The previous review noted cfg(all(feature = "cuda")) / cfg(all(test)) in sibling modules — I confirmed this has already been cleaned up; no instances remain.

Notes on existing review comments

  • Host-side asserts (hintstore.cu:219-220): Per shuklaayush's comment, height being a power of two and ≥ rows_used are invariants guaranteed by next_power_of_two_or_zero on the Rust side (cuda.rs:64). They're cheap host-side checks and consistent with other kernels, so fine to keep.

  • bitwise_num_bits parameter (hintstore.cu:215): Agree this is always RV64_CELL_BITS (8) and could be a constant. Good candidate for a follow-up cleanup PR across all kernels.

LGTM — no correctness issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants