k256: endomorphism-aware wNAF for vartime scalar multiplication by 42Pupusas · Pull Request #1745 · RustCrypto/elliptic-curves

42Pupusas · 2026-04-24T01:07:39Z

Summary

Replaces the placeholder MulVartime / MulByGeneratorVartime impls
for k256::ProjectivePoint (which fell back to the constant-time path
and had TODOs to match) with a real endomorphism-aware width-5 wNAF,
then folds the combined mul_by_generator_and_mul_add_vartime into a
single shared-doublings ladder over all 4 GLV sub-scalars.

Closes #1725.

What changes

Commit 1 — wNAF core. GLV-decompose the scalar into two ~128-bit
halves, compute a width-5 signed-digit NAF of each magnitude, and run
a standard left-to-right double-and-add with precomputed odd
multiples [P, 3P, ..., 15P]. Sign is folded into the precomputed
points at setup.
Commit 2 — share doublings. Extract a small WnafSlot
(odd-multiples table + digits) and a wnaf_ladder helper. The
combined a*G + b*P variant runs one ladder over all 4 GLV slots
(G + βG + P + βP), doing one double() per step instead of two
independent ladders.
Commit 3 — debug_assert. Guard the fixed 130-entry wNAF digit
buffer; the bound is currently implicit in WNAF_WIDTH = 5, this
makes it explicit at test time.

Perf (Schnorr verify, default features, x86_64)

Stage	time/verify
Master (constant-time fallback)	~62 µs
After commit 1 (wNAF)	~53 µs
After commit 2 (shared ladder)	~50 µs

~19% faster end-to-end. Also speeds up any other user of
MulVartime / MulByGeneratorVartime on the k256 curve.

Test plan

cargo test -p k256 --lib --features getrandom — 89 passed
New randomized tests for mul_vartime and
mul_and_mul_add_vartime vs. the constant-time reference
(32 iterations each with ProjectivePoint::generate() and
Scalar::generate()).
Edge-case tests: scalar = 0, 1, −1, point = identity.
cargo bench -p k256 --bench schnorr -- verify on an idle host
confirms the numbers above (criterion-reported change is stable
across runs).

Notes

Not constant time; SECURITY: comments are on the two vartime impls.
Only reachable via the MulVartime / MulByGeneratorVartime traits
that callers opt into for non-secret scalars.
Briefly explored several further optimizations (batched-affine
odd-multiples via Montgomery's trick; static precomputed G tables
with mixed-add; wider window for the G side). The first two
regressed perf at this width/scalar size; the last gave ~4% more but
added a ~6 KB static, const-generics, and a new LazyLock path —
not worth the complexity for a single-curve specialization. This PR
sticks to the change that's pure upside.

Replaces the placeholder MulVartime / MulByGeneratorVartime impls (which just called the constant-time path and had TODOs to match) with a width-5 wNAF that uses the GLV endomorphism to split each scalar into two ~128-bit halves. Schnorr verify: ~62 µs -> ~53 µs (14% faster, no precomputed-tables; ~55 µs with tables). Addresses RustCrypto#1725. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Folds the combined `mul_by_generator_and_mul_add_vartime` into a single wNAF ladder over all 4 GLV sub-scalars (s1, s2 for G and the endomorphism; e1, e2 for P and the endomorphism). One `double()` per step instead of two independent ladders. Factors out a small `WnafSlot` (odd-multiples table + digits) and a `wnaf_ladder` helper so the single-point `mul_vartime` and the combined op share the same loop body. Schnorr verify: ~53 µs -> ~50 µs (no precomputed-tables; ~51 µs with tables). Total vs. pre-wNAF baseline: ~62 µs -> ~50 µs (~19% faster). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`wnaf_128` writes into a fixed 130-entry buffer; the bound holds for the current `WNAF_WIDTH = 5` and the ≤128-bit GLV sub-scalars, but it's implicit. Add a `debug_assert!` in the loop so that any future change to `WNAF_WIDTH` that invalidates the bound is caught at test time rather than silently writing out of bounds in worst-case inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`wnaf_128` tracked the residual scalar in two u64 limbs, but a negative recentered digit adds up to 2^(W-1) − 1 to the value, which can legitimately overflow past bit 127 when the input is close to 2^128 − 1. The old code let `hi.wrapping_add(1)` silently wrap, losing the carried bit and producing a NAF that reconstructs to the wrong value. The GLV decomposition's `(r1, r2)` each have magnitude strictly less than 2^128, so values in the carry-out window are possible (though vanishingly rare in random scalars — which is why the existing randomized tests never caught it). Fix by carrying the overflow bit into a third limb `top` that is absorbed back on the next right-shift. Perf impact is in the noise: the `top` branch is almost never taken and the predictor handles it cleanly. Add two regression tests: - `test_wnaf_128_reconstruction_adversarial` — reconstructs the NAF of a scalar with low 128 bits = 0xFF..FF and asserts it equals 2^128 − 1. - `test_mul_vartime_adversarial_scalars` — end-to-end check that `mul_vartime(P, k)` matches the constant-time reference when `k`'s low 128 bits trigger the carry window. Also add a `debug_assert!` on `idx` in `WnafSlot::apply` to guard the parallel invariant (`idx < WNAF_TABLE_SIZE`) if `WNAF_WIDTH` is ever widened without growing the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

42Pupusas and others added 4 commits April 23, 2026 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k256: endomorphism-aware wNAF for vartime scalar multiplication#1745

k256: endomorphism-aware wNAF for vartime scalar multiplication#1745
42Pupusas wants to merge 4 commits intoRustCrypto:masterfrom
42Pupusas:k256/schnorr-verify-perf

42Pupusas commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

42Pupusas commented Apr 24, 2026

Summary

What changes

Perf (Schnorr verify, default features, x86_64)

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant