k256: endomorphism-aware wNAF for vartime scalar multiplication#1745
Open
42Pupusas wants to merge 4 commits intoRustCrypto:masterfrom
Open
k256: endomorphism-aware wNAF for vartime scalar multiplication#174542Pupusas wants to merge 4 commits intoRustCrypto:masterfrom
42Pupusas wants to merge 4 commits intoRustCrypto:masterfrom
Conversation
Replaces the placeholder MulVartime / MulByGeneratorVartime impls (which just called the constant-time path and had TODOs to match) with a width-5 wNAF that uses the GLV endomorphism to split each scalar into two ~128-bit halves. Schnorr verify: ~62 µs -> ~53 µs (14% faster, no precomputed-tables; ~55 µs with tables). Addresses RustCrypto#1725. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Folds the combined `mul_by_generator_and_mul_add_vartime` into a single wNAF ladder over all 4 GLV sub-scalars (s1, s2 for G and the endomorphism; e1, e2 for P and the endomorphism). One `double()` per step instead of two independent ladders. Factors out a small `WnafSlot` (odd-multiples table + digits) and a `wnaf_ladder` helper so the single-point `mul_vartime` and the combined op share the same loop body. Schnorr verify: ~53 µs -> ~50 µs (no precomputed-tables; ~51 µs with tables). Total vs. pre-wNAF baseline: ~62 µs -> ~50 µs (~19% faster). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`wnaf_128` writes into a fixed 130-entry buffer; the bound holds for the current `WNAF_WIDTH = 5` and the ≤128-bit GLV sub-scalars, but it's implicit. Add a `debug_assert!` in the loop so that any future change to `WNAF_WIDTH` that invalidates the bound is caught at test time rather than silently writing out of bounds in worst-case inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`wnaf_128` tracked the residual scalar in two u64 limbs, but a negative recentered digit adds up to 2^(W-1) − 1 to the value, which can legitimately overflow past bit 127 when the input is close to 2^128 − 1. The old code let `hi.wrapping_add(1)` silently wrap, losing the carried bit and producing a NAF that reconstructs to the wrong value. The GLV decomposition's `(r1, r2)` each have magnitude strictly less than 2^128, so values in the carry-out window are possible (though vanishingly rare in random scalars — which is why the existing randomized tests never caught it). Fix by carrying the overflow bit into a third limb `top` that is absorbed back on the next right-shift. Perf impact is in the noise: the `top` branch is almost never taken and the predictor handles it cleanly. Add two regression tests: - `test_wnaf_128_reconstruction_adversarial` — reconstructs the NAF of a scalar with low 128 bits = 0xFF..FF and asserts it equals 2^128 − 1. - `test_mul_vartime_adversarial_scalars` — end-to-end check that `mul_vartime(P, k)` matches the constant-time reference when `k`'s low 128 bits trigger the carry window. Also add a `debug_assert!` on `idx` in `WnafSlot::apply` to guard the parallel invariant (`idx < WNAF_TABLE_SIZE`) if `WNAF_WIDTH` is ever widened without growing the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the placeholder
MulVartime/MulByGeneratorVartimeimplsfor
k256::ProjectivePoint(which fell back to the constant-time pathand had TODOs to match) with a real endomorphism-aware width-5 wNAF,
then folds the combined
mul_by_generator_and_mul_add_vartimeinto asingle shared-doublings ladder over all 4 GLV sub-scalars.
Closes #1725.
What changes
halves, compute a width-5 signed-digit NAF of each magnitude, and run
a standard left-to-right double-and-add with precomputed odd
multiples
[P, 3P, ..., 15P]. Sign is folded into the precomputedpoints at setup.
WnafSlot(odd-multiples table + digits) and a
wnaf_ladderhelper. Thecombined
a*G + b*Pvariant runs one ladder over all 4 GLV slots(G + βG + P + βP), doing one
double()per step instead of twoindependent ladders.
buffer; the bound is currently implicit in
WNAF_WIDTH = 5, thismakes it explicit at test time.
Perf (Schnorr verify, default features, x86_64)
~19% faster end-to-end. Also speeds up any other user of
MulVartime/MulByGeneratorVartimeon the k256 curve.Test plan
cargo test -p k256 --lib --features getrandom— 89 passedmul_vartimeandmul_and_mul_add_vartimevs. the constant-time reference(32 iterations each with
ProjectivePoint::generate()andScalar::generate()).cargo bench -p k256 --bench schnorr -- verifyon an idle hostconfirms the numbers above (criterion-reported change is stable
across runs).
Notes
SECURITY:comments are on the two vartime impls.Only reachable via the
MulVartime/MulByGeneratorVartimetraitsthat callers opt into for non-secret scalars.
odd-multiples via Montgomery's trick; static precomputed G tables
with mixed-add; wider window for the G side). The first two
regressed perf at this width/scalar size; the last gave ~4% more but
added a ~6 KB static, const-generics, and a new
LazyLockpath —not worth the complexity for a single-curve specialization. This PR
sticks to the change that's pure upside.