test(DON'T MERGE): develop-v2.0.0-rc.1#2592
test(DON'T MERGE): develop-v2.0.0-rc.1#2592shuklaayush wants to merge 106 commits intodevelop-v2.0.0-betafrom
Conversation
e9d2afb to
deff2f2
Compare
a6c7835 to
8d51084
Compare
8d51084 to
743b94e
Compare
|
Claude finished @shuklaayush's task in 3m 55s —— View job Code Review for PR #2592This is a large PR (100 files, +31590/-16895) covering several major changes. Below is a detailed review organized by area. Overview of Changes
SHA2-air Crate (replaces SHA256-air)Verdict: Excellent work. The generalization from SHA-256 to the full SHA-2 family is complete and correct.
No critical issues found in this area. ColsRef Derive MacroVerdict: Functional but has several issues worth addressing. Issues found:
Memory System RefactoringVerdict: Clean and complete refactoring.
One item to verify ( // Constrain that an 'initial' row has all timestamp zero.
// Formula: direction * (direction + 1) selects direction == 1
let mut when_initial =
builder.when(local.expand_direction * (local.expand_direction + AB::F::ONE));The comment says this selects VM Architecture & CUDA ChangesVerdict: All correct and consistent.
CI & Benchmark Changes
SummaryThis is a well-executed PR with solid architectural changes. The main areas for improvement are in the ColsRef derive macro (safety, testing, deduplication) and a misleading comment in the persistent memory constraint logic. The SHA2-air generalization and memory system refactoring are particularly well done. |
|
Claude finished @shuklaayush's task in 6m 42s —— View job Code Review for PR #2592This is a large PR (100 files, +31590/-16895) covering several major changes. Below is a detailed review organized by area. Overview
SHA2-air Crate (replaces SHA256-air)Verdict: Excellent work. No correctness or soundness issues found.
No issues found in this area. ColsRef Derive MacroVerdict: Functional but has several issues worth addressing. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"Safety: Unsafe pointer casting without alignment verification
High duplication between const/mut paths
|
3b2b382 to
4911c15
Compare
4911c15 to
cf270ee
Compare
|
Claude finished @jpw-axiom's task in 3m 2s —— View job Code Review for PR #2592 (latest sync)This is a follow-up review after new commits were pushed. The PR now includes 46 commits across 100 files (+31496/-16874). I've reviewed all changes against the New Commits Since Last Review
New Change:
|
|
Claude finished @shuklaayush's task in 3s —— View job Code Review for PR #2592This is a review after the latest sync (48 commits, 100 files, +31589/-16894). This review covers all changes against Overview
New Commits Since Last Review
All new commits are correct. The SHA2-air CrateVerdict: Excellent. No correctness or soundness issues found.
ColsRef Derive MacroVerdict: Functional but has issues to address. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"Safety concern: Unsafe pointer cast without alignment verification
let #slice_var: &[#elem_type] = unsafe { &*(#slice_var as *const [T] as *const [#elem_type]) };This relies on runtime High code duplication
Test has no assertions
println!("{}, {}", test.arr, test.sum); // no assertionsAt minimum, assert that Memory System RefactoringVerdict: Clean and complete.
VM Architecture & CUDA ChangesVerdict: All correct and consistent.
CI & Benchmark Changes
SummaryThis is a well-executed PR with solid architectural changes. The SHA2-air generalization, memory system refactoring, and VM config simplification are all sound. The new commits since the last review are all correct and well-targeted. Items to address:
|
|
Claude finished @shuklaayush's task in 3m 30s —— View job Code Review for PR #2592This is a review after the latest sync (48 commits, 100 files, +31609/-17094). All changes reviewed against Overview
SHA2-air CrateVerdict: Excellent. No correctness or soundness issues found.
ColsRef Derive MacroVerdict: Functional but has several issues. 1. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"2. Safety: Unsafe pointer cast without alignment verification
3. High code duplication
4. Test gaps
Memory System RefactoringVerdict: Clean and complete.
VM Architecture & CUDA ChangesVerdict: All correct and consistent.
CI & Benchmark Changes
SummaryThis is a well-executed PR with solid architectural changes. The SHA2-air generalization, memory system refactoring, and VM config simplification are all sound and well-designed. Items to address:
|
71d3584 to
f9268e6
Compare
|
Claude finished @shuklaayush's task in 2m 34s —— View job Code Review for PR #2592This is a review after the latest sync (48 commits, 100 files, +31714/-16972). All changes reviewed against Overview
New Commit: Keccakf CUDA Two-Phase Optimization (#2599)Verdict: Excellent optimization, no correctness issues found. The redesign from single-kernel (thread-per-permutation) to two-phase (compute then write) is well-motivated and correctly implemented:
Verified correct:
The ~1.9% scratch overhead (4800 bytes/perm vs 252864 bytes/perm for trace) is negligible. Outstanding Issues (Carried Forward)These were identified in prior reviews and remain unaddressed: 1. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"2. ColsRef derive macro: high code duplication
3. ColsRef derive macro: test gaps
4. Minor typo
SummaryThe new keccakf CUDA two-phase optimization is a well-designed and correctly implemented performance improvement. The architectural separation between compute (phase 1) and trace-write (phase 2) cleanly solves the memory coalescing bottleneck while adding minimal scratch overhead. The overall PR remains well-executed. The only actionable items are the minor copy-paste bug in the ColsRef panic message, code duplication in the derive macro, and test coverage gaps.
|
271868d to
7a23ceb
Compare
|
Claude finished @shuklaayush's task in 2m 1s —— View job Code Review for PR #2592This is a review after the latest sync (48 commits, 100 files, +31714/-16983). All changes reviewed against Overview
SHA2-air CrateVerdict: Excellent. No correctness or soundness issues found. FIPS 180-4 constants verified, constraint degree managed at ≤3, SOUNDNESS.md is thorough. ColsRef Derive MacroVerdict: Functional but has several issues. 1. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"2. Safety: Unsafe pointer cast without alignment verification
let #slice_var: &[#elem_type] = unsafe { &*(#slice_var as *const [T] as *const [#elem_type]) };This relies on runtime 3. High code duplication
4. Test has no assertions
println!("{}, {}", test.arr, test.sum); // no assertionsAt minimum, assert that Memory System RefactoringVerdict: Clean and complete.
VM Architecture & CUDA ChangesVerdict: All correct and consistent.
Items to Address
|
|
Claude finished @shuklaayush's task in 1m 37s —— View job Code Review for PR #2592This is a review after the latest sync (49 commits, 100 files, +31740/-17012). All changes reviewed against Overview
New Commit: Deferral Volatile and MEMORY_OP_SIZE CleanupVerdict: Correct and complete. Commit
Verified: Outstanding Issues (Carried Forward)These were identified in prior reviews and remain unaddressed: 1. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"2. Safety: Unsafe pointer cast without alignment verification
3. High code duplication
4. ColsRef test has no assertions
5. Minor typo
SummaryThe new deferral cleanup commit is correct and consistent with the broader memory system refactoring. The overall PR remains well-executed with solid architectural changes across SHA2-air, memory system, VM config, and CUDA. The only actionable items are the minor issues in the ColsRef derive macro (copy-paste bug, safety, duplication, test coverage).
|
|
Claude finished @shuklaayush's task in 2m 23s —— View job Code Review for PR #2592This is a review after the latest sync (50 commits, 100 files, +31762/-17012). All changes reviewed against Overview
New Commits Since Last Review
Deferral Test Fixes (
|
|
Claude finished @shuklaayush's task in 1m 19s —— View job Code Review for PR #2592This is a review after the latest sync (51 commits, 100 files, +31766/-17012). All changes reviewed against Overview
New Commit: GPU
|
|
Claude finished @shuklaayush's task in 2m 35s —— View job Code Review for PR #2592This is a review after the latest sync (52 commits, 100 files, +31787/-17010). All changes reviewed against Overview
New Commit: Load Deferral AS Elements as Felts in CUDA Memory Kernels (#2607)Verdict: Correct and necessary fix. Commit Changes verified:
One concern: The hardcoded Outstanding Issues (Carried Forward)These were identified in prior reviews and remain unaddressed: 1. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"2. Safety: Unsafe pointer cast without alignment verification
3. High code duplication
4. ColsRef test has no assertions
5. Minor typo
Areas Verified Sound
SummaryThis is a well-executed PR with solid architectural changes. The latest commit ( Items to address (priority order):
|
d4719da to
05f4de6
Compare
05f4de6 to
a0ee68f
Compare
|
Claude finished @shuklaayush's task in 2m 15s —— View job Code Review for PR #2592This is a review after the latest sync (53 commits, 324 files, +31788/-17009). All changes reviewed against Overview
Latest Commit:
|
|
Claude finished @jonathanpwang's task in 1m 0s —— View job Code Review for PR #2592This is a review after the latest sync (54 commits, 343 files, +31829/-17012). All changes reviewed against Overview
Latest Commits (since last review)
Benchmark guest programs ( Rebase fixes ( SHA2-air CrateVerdict: Excellent. No correctness or soundness issues found.
ColsRef Derive MacroVerdict: Functional but has several issues worth addressing. 1. Bug: Copy-paste error in panic message
panic!("Unsupported field type (in get_mut_cols_ref_fields): {:?}", f.ty);
// ^^^ Should be "get_const_cols_ref_fields"2. Safety: Unsafe pointer cast without alignment verification
3. High code duplication
4. Test has no assertions
Memory System RefactoringVerdict: Clean and complete. No soundness issues.
VM Architecture & CUDA ChangesVerdict: All correct and consistent.
Note: Hardcoded
|
This comment has been minimized.
This comment has been minimized.
2ea2f43 to
f2c28ec
Compare
deletes the custom sccache action and uses runson
towards int-5817
fixes failing ci
closes INT-6721
…8` (#2685) resolves int-6954
depends on #2685 resolves int-6953
Co-authored-by: Ayush Shukla <ayush@axiom.xyz>
Closes INT-6464 ## CUDA Explicit Stream Migration (OpenVM side) Companion to [stark-backend#317](openvm-org/stark-backend#317). Adds `cudaStream_t stream` to all OpenVM CUDA launchers and FFI wrappers, injects `DeviceContext` through the VM builder and chip construction APIs, removes all PTDS references, and lifts `DeviceCtx` to a first-class associated type on `ProverDevice` — eliminating `MaybeDeviceContext`, `#[cfg]` duplication, and conditional `device_ctx: Option<&DeviceContext>` parameters. **224 files changed, +2902 / -1353** See [cuda-stream-migration-design.md](https://github.com/openvm-org/v2-proof-system/blob/test/stream-migration/cuda-stream-migration-design.md) for full design rationale. --- ## CUDA launchers (~67 `.cu` files + 2 `.cuh` headers) Every `extern "C"` launcher gains `cudaStream_t stream` as final parameter. All kernel launches use `<<<grid, block, 0, stream>>>`. All CUB calls (`DeviceScan`, `DeviceReduce`, `DeviceMergeSort`) pass explicit `stream`. `scan.cuh` and `affine_scan.cuh` had `= cudaStreamPerThread` default parameters removed — `stream` is now required. --- ## FFI wrappers (15 `cuda_abi.rs` / `abi.rs` files) Every `extern "C"` declaration and safe Rust wrapper gains `stream: cudaStream_t`. Covers `crates/vm`, `crates/circuits/primitives`, `crates/circuits/poseidon2-air`, `crates/recursion` (6 files), and all extensions (`rv32im`, `keccak256`, `bigint`, `sha2`, `deferral`). --- ## VM builder — `DeviceContext` injection `VmBuilder::create_chip_complex` gains `device: &E::PD` parameter. `VirtualMachine::new` passes `engine.device()` through. 25 implementations updated (19 CPU impls ignore it, 6 GPU impls extract `DeviceContext`). --- ## Stateful GPU chip constructors GPU chips that allocate persistent `DeviceBuffer`s now store `device_ctx: DeviceContext` and use `_on` allocation variants: - `BitwiseOperationLookupChipGPU` — histogram accumulator - `VariableRangeCheckerChipGPU` — range check accumulator - `RangeTupleCheckerChipGPU` — range tuple accumulator - `Poseidon2ChipGPU` — shared hash records buffer + index counter - `MemoryMerkleTree` — merkle tree device state The `Chip::generate_proving_ctx(&self, records)` trait is **unchanged** — chips access `self.device_ctx` internally. --- ## VM system CUDA modules (`crates/vm/src/system/cuda/`) All GPU system modules use `self.device_ctx.stream.as_raw()`: `poseidon2.rs`, `memory.rs`, `boundary.rs`, `phantom.rs`, `program.rs`, `merkle_tree/mod.rs`. --- ## `DeviceCtx` as associated type on `ProverDevice` (87 files, -630 net lines) The final refactor lifts `DeviceContext` from a `#[cfg(feature = "cuda")]`-gated parameter to a first-class associated type on `ProverDevice` (defined in stark-backend#317): ```rust // In ProverDevice (stark-backend): type DeviceCtx: Clone + Send + Sync; fn device_ctx(&self) -> &Self::DeviceCtx; // GpuDevice: DeviceCtx = DeviceContext // CpuDevice: DeviceCtx = () ``` This eliminates: - `MaybeDeviceContext` trait + all 3 impls + `device_ctx_for_engine()` helper — **deleted entirely** - ~6 pairs of `#[cfg(feature = "cuda")]` / `#[cfg(not)]` duplicated methods in continuations provers (`inner/mod.rs`, `deferral/hook/mod.rs`, `deferral/inner/mod.rs`) — **unified into single versions** - ~30 `#[cfg(feature = "cuda")] device_ctx: Option<&DeviceContext>` conditional parameters — **replaced with unconditional `&DC` generic** - cfg-duplicated `VerifierTraceGen` trait methods in `recursion/src/system/mod.rs` — **unified** Callers now use `engine.device().device_ctx()` directly. The `EngineDeviceCtx<E>` type alias avoids spelling out the full associated type path. --- ## Recursion — `VerifierTraceGen` and module tracegen `VerifierTraceGen` and `InnerTraceGen` traits gain a `DC` generic parameter for the device context type. `generate_proving_ctxs` accepts `&DC` unconditionally. Stream synchronization at proof boundaries via `device_ctx.stream.synchronize()`. --- ## Continuations + guest verifier `InnerTraceGen`, `DeferralHookTraceGen`, `DeferralInnerTraceGen`, `RootTraceGen` — all accept `&DC` and use the engine-owned context. `agg_prove`, `from_pk`, `new` — single versions, no `#[cfg]` duplication. Guest verifier circuit trace generation passes device context through. --- ## Extensions All extension GPU modules pass `stream` to FFI calls: `rv32im` (all ALU/branch/load/store cuda modules), `sha2`, `keccak256`, `bigint`, `deferral`. --- ## Removed - `--default-stream=per-thread` from `scripts/bin/ptx_details.rs` - All `cudaStreamPerThread` imports and usages - All `current_stream_sync()` / `current_stream_id()` usages - `MaybeDeviceContext` trait + `device_ctx_for_engine()` helper - Temporary `DeviceContext` escape hatches (fresh streams created outside engine) - ~400 lines of `#[cfg]` duplication in continuations provers --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves INT-7422.
Resolves INT-7433.
## Summary
Refactors the SHA-2 block hasher CUDA tracegen kernel from a monolithic
1-thread-per-block design into a three-phase pipeline for better memory
coalescing on column-major traces.
| Phase | Kernel | Parallelism | Purpose |
|-------|--------------------------|-----------------|----------------------------------------------------------------|
| 1 | `sha2_first_pass_phase1` | 1 thread/block | Compute SHA-2 rounds,
snapshot `{a..h, w_buf}` to scratch |
| 2 | `sha2_first_pass_phase2` | 1 thread/row | Read snapshot, write
trace columns (coalesced) |
| 3 | `sha2_first_pass_phase3` | 1 thread/block | Cross-row dependencies
(`intermed_4/8/12`, carry) |
Also drops unused parameters (`ptr_max_bits`,
`range_checker_ptr`/`num_bins`, `timestamp_max_bits`) from the block
hasher kernel and its Rust-side wrappers, simplifying
`Sha2BlockHasherChipGpu`.
## Changes
- `extensions/sha2/circuit/cuda/src/sha2_hasher.cu`: split first-pass
tracegen into three phases; tightened `CHECK_KERNEL` error propagation
in `launch_sha2_second_pass_dependencies`.
- `extensions/sha2/circuit/src/cuda/{mod.rs,cuda_abi.rs}`: wire the new
scratch buffer and drop unused wrapper arguments.
- `extensions/sha2/circuit/src/extension/cuda.rs`: update
`Sha2BlockHasherChipGpu::new` call sites.
- `extensions/sha2/circuit/src/sha2_chips/tests.rs`: update tests to
match the new constructor signature.
## Correctness notes
- Phase 1 snapshots state before each row's rounds; Phase 2 restores
that snapshot and replays the rounds to compute intermediate trace
values. The round computations are identical across phases.
- Scratch is sized as `num_blocks * ROWS_PER_BLOCK * (8 + BLOCK_WORDS)`
words, exactly matching what's written/read (no oversize).
- Phase 2 launches `num_blocks * ROWS_PER_BLOCK` threads, so every
thread maps to a valid block.
- Padding rows are still handled by the existing
`sha2_fill_first_dummy_row` + `sha2_fill_invalid_rows` kernels.
- Digest row: Phase 1 snapshots the final state (after all rounds) for
`row_in_block == ROUND_ROWS`; Phase 2 reads it to compute `final_hash =
prev_hash + work_vars`.
## Test plan
- [ ] `cargo nextest run --cargo-profile=fast -p openvm-sha2-circuit`
- [ ] CUDA tracegen tests for both SHA-256 and SHA-512 variants (feature
`cuda`)
- [ ] Verify trace matrices match CPU reference for representative
inputs
resolves int-6985
Towards INT-7429.
## Summary
- Mark `finalize_ptr`, `finalize`, and `native_keccak256` as `unsafe fn`
in both `zkvm_impl.rs` and `host_impl.rs`. These functions write 32
bytes to an output pointer/slice via `copy_nonoverlapping` without
runtime size checks (only `debug_assert!`), so callers must guarantee
the buffer is at least 32 bytes.
- Add `assert!` in the `tiny_keccak::Hasher` trait impl in `lib.rs`
since the trait method must be safe and cannot propagate `unsafe` to
callers.
- Safe public APIs (`keccak256()`, `set_keccak256()`) are unchanged —
they pass correctly-sized buffers internally.
**TODO:** The `openvm-org/tiny-keccak` fork and
`openvm-org/openvm-primitives` also call `Keccak256::finalize` /
`native_keccak256` and will need matching updates (add `unsafe {}`
blocks and/or `assert!` at their trait boundaries).
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com>
Towards INT-7422.
Resolves INT-7439, INT-7422.
…ebra+ecc + update sha2 docs (#2710) Resolves INT-7439.
Towards INT-7429. ## Summary - Fix OOB read in `native_xorin` when `len` is not 4-byte aligned: `AlignedBuf::new` copied `adjusted_len` bytes from source buffers, reading 1-3 bytes past the valid region. Replace with `AlignedBuf::uninit` + `copy_nonoverlapping` of only `len` bytes. - Mark `native_xorin` and `native_keccakf` as `unsafe extern "C" fn` since they take raw pointers and require callers to guarantee pointer validity and buffer sizes. Add `unsafe` blocks in callers `xorin` and `keccakf` with safety comments. - Update keccak extension docs code example to reflect the new unsafe signatures. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Towards INT-7478. - Replace `assert_eq!` with `Err(StaticProgramError::InvalidInstruction)` in multiplication executor `pre_compute_impl` for both bigint (mult.rs) and rv32im (mul/execution.rs). Functions returning `Result` should use error returns instead of panics, consistent with other executors in the codebase. - Add `/// # Safety` doc comments to all 12 `unsafe extern "C" fn` in bigint guest externs (externs.rs), documenting pointer validity, buffer size, and alignment preconditions. Follows precedent set by PRs #2706 (keccak) and #2708 (sha2). - Document unused branch opcode variants. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Issue found and initially corrected by [Daniel Heim](https://github.com/dghelm)
Resolves INT-7541. Issue found and initially corrected by [Daniel Heim](https://github.com/dghelm).
dada909 to
e4aab20
Compare
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: e4aab20 |
No description provided.