Skip to content

Use compact GPU tower witnesses#1330

Open
hero78119 wants to merge 13 commits intomasterfrom
feat/prover_mle_zero_padding
Open

Use compact GPU tower witnesses#1330
hero78119 wants to merge 13 commits intomasterfrom
feat/prover_mle_zero_padding

Conversation

@hero78119
Copy link
Copy Markdown
Collaborator

@hero78119 hero78119 commented Apr 26, 2026

Problem

GPU prover MLE/tower paths still carried logical-domain padding assumptions, which increased VRAM and made compact witness memory accounting inaccurate.

Design Rationale

Keep protocol-facing behavior unchanged while making prover-side GPU witnesses default-aware and compact by occupied rows. Missing logup numerators are represented by one scalar ONE polynomial with tail_default=ONE, avoiding per-chunk ones buffers.

Change Highlights

  • ceno_zkvm: compact tower memory estimates and compact logup ones allocation.
  • gkr_iop: GPU utility plumbing for compact/default-aware MLEs.
  • summary.md: records executed e2e commands and outcomes.

Benchmark / Performance Impact

Operation

Operation master (s) this PR (s) Improve (master -> this PR)
keccak e2e, serial mem-tracking not captured 9.63 validates estimator
keccak e2e, concurrent GPU witgen not captured 377.54 incl. CUDA rebuild validates compact GPU path

Layer

Layer master (s) this PR (s) Improve (master -> this PR)
tower build/prove not captured n/a expected lower VRAM from compact buffers

Benchmark command(s):

CENO_GPU_MEM_TRACKING=1 CENO_CONCURRENT_CHIP_PROVING=0 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
CENO_GPU_MEM_TRACKING=0 CENO_CONCURRENT_CHIP_PROVING=1 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Environment: GPU cc=12.0, 70 SMs, CUDA build via local cuda_hal, head 12453f6e.

raw data:

  • master: not captured
  • this PR: serial 0:09.63; concurrent 6:17.54 including release rebuild; pool peak 262MB

Testing

cargo check --features gpu --package ceno_zkvm --bin e2e
RUST_LOG=error CENO_CONCURRENT_CHIP_PROVING=0 target/release/e2e --platform=ceno --max-cycle-per-shard=1000 --hints=2 --public-io=5 --shard-id=0 examples/target/riscv32im-ceno-zkvm-elf/release/examples/fibonacci
CENO_GPU_MEM_TRACKING=1 CENO_CONCURRENT_CHIP_PROVING=0 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
CENO_GPU_MEM_TRACKING=0 CENO_CONCURRENT_CHIP_PROVING=1 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
git diff --check
git -C ../ceno-gpu diff --check

All passed.

Risks and Rollout

  • Review focus: compact tail defaults must match logical zero/one padding in prover-only GPU paths.
  • Verifier/protocol transcript behavior is not intentionally changed; reviewer should still inspect verifier-adjacent diff for parity.
  • Rollback: revert this PR together with the matching ceno-gpu and gkr-backend compact MLE PRs.

Follow-ups (optional)

Capture clean master-vs-PR benchmark numbers on a stable GPU runner.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

@hero78119 hero78119 changed the title Feat/prover mle zero padding Use compact GPU tower witnesses Apr 26, 2026
@hero78119 hero78119 force-pushed the feat/prover_mle_zero_padding branch from 506a380 to df88dec Compare April 27, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant