Use compact GPU tower witnesses by hero78119 · Pull Request #1330 · scroll-tech/ceno

hero78119 · 2026-04-26T02:31:34Z

Problem

GPU prover MLE/tower paths still carried logical-domain padding assumptions, which increased VRAM and made compact witness memory accounting inaccurate.

Design Rationale

Keep protocol-facing behavior unchanged while making prover-side GPU witnesses default-aware and compact by occupied rows. Missing logup numerators are represented by one scalar ONE polynomial with tail_default=ONE, avoiding per-chunk ones buffers.

Change Highlights

ceno_zkvm: compact tower memory estimates and compact logup ones allocation.
gkr_iop: GPU utility plumbing for compact/default-aware MLEs.
summary.md: records executed e2e commands and outcomes.

Benchmark / Performance Impact

Operation

Operation	master (s)	this PR (s)	Improve (master -> this PR)
keccak e2e, serial mem-tracking	not captured	9.63	validates estimator
keccak e2e, concurrent GPU witgen	not captured	377.54 incl. CUDA rebuild	validates compact GPU path

Layer

Layer	master (s)	this PR (s)	Improve (master -> this PR)
tower build/prove	not captured	n/a	expected lower VRAM from compact buffers

Benchmark command(s):

CENO_GPU_MEM_TRACKING=1 CENO_CONCURRENT_CHIP_PROVING=0 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
CENO_GPU_MEM_TRACKING=0 CENO_CONCURRENT_CHIP_PROVING=1 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Environment: GPU cc=12.0, 70 SMs, CUDA build via local cuda_hal, head 12453f6e.

raw data:

master: not captured
this PR: serial 0:09.63; concurrent 6:17.54 including release rebuild; pool peak 262MB

Testing

cargo check --features gpu --package ceno_zkvm --bin e2e
RUST_LOG=error CENO_CONCURRENT_CHIP_PROVING=0 target/release/e2e --platform=ceno --max-cycle-per-shard=1000 --hints=2 --public-io=5 --shard-id=0 examples/target/riscv32im-ceno-zkvm-elf/release/examples/fibonacci
CENO_GPU_MEM_TRACKING=1 CENO_CONCURRENT_CHIP_PROVING=0 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
CENO_GPU_MEM_TRACKING=0 CENO_CONCURRENT_CHIP_PROVING=1 CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --features gpu --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
git diff --check
git -C ../ceno-gpu diff --check

All passed.

Risks and Rollout

Review focus: compact tail defaults must match logical zero/one padding in prover-only GPU paths.
Verifier/protocol transcript behavior is not intentionally changed; reviewer should still inspect verifier-adjacent diff for parity.
Rollback: revert this PR together with the matching ceno-gpu and gkr-backend compact MLE PRs.

Follow-ups (optional)

Capture clean master-vs-PR benchmark numbers on a stable GPU runner.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

…_mle_zero_padding

…/ceno into feat/prover_mle_zero_padding

hero78119 added 3 commits April 25, 2026 23:18

refactor GPU compact tower witness flow

ac49ac6

Fix compact tower memory accounting

84a2631

Optimize compact logup ones allocation

12453f6

hero78119 changed the title ~~Feat/prover mle zero padding~~ Use compact GPU tower witnesses Apr 26, 2026

hero78119 added 8 commits April 26, 2026 11:22

update dep

7d60f01

Merge branch 'master' into feat/prover_mle_zero_padding

925de92

fix main mem estimation

e9fbe9c

Merge branch 'master' of github.com:scroll-tech/ceno into feat/prover…

46e87bb

…_mle_zero_padding

Merge branch 'feat/prover_mle_zero_padding' of github.com:scroll-tech…

b888fbb

…/ceno into feat/prover_mle_zero_padding

fix mem estimator

5ecce04

snapshot compact tower estimator state

be14006

rollback Cargo.toml, Cargo.lock change

df88dec

hero78119 force-pushed the feat/prover_mle_zero_padding branch from 506a380 to df88dec Compare April 27, 2026 03:09

hero78119 added 2 commits April 27, 2026 13:43

fix memory estimation

b57b692

verifier log

c50b793

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use compact GPU tower witnesses#1330

Use compact GPU tower witnesses#1330
hero78119 wants to merge 13 commits intomasterfrom
feat/prover_mle_zero_padding

hero78119 commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hero78119 commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Design Rationale

Change Highlights

Benchmark / Performance Impact

Operation

Layer

Testing

Risks and Rollout

Follow-ups (optional)

Copilot Reviewer Directive (keep this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hero78119 commented Apr 26, 2026 •

edited

Loading