Skip to content

Consolidate vmp/svp traits, improve CRT^-1 by ~x2#135

Merged
Pro7ech merged 1 commit intomainfrom
dev_impl_opt
Mar 12, 2026
Merged

Consolidate vmp/svp traits, improve CRT^-1 by ~x2#135
Pro7ech merged 1 commit intomainfrom
dev_impl_opt

Conversation

@Pro7ech
Copy link
Copy Markdown
Collaborator

@Pro7ech Pro7ech commented Mar 3, 2026

poulpy-hal

  • Remove VmpApplyDftToDftAdd and SvpApplyDftToDftAdd traits; merge additive variant into VmpApplyDftToDft / SvpApplyDftToDft via a new limb_offset parameter.
    These traits accumulated VMP results directly into a scattered output buffer, causing severe cache misses. Writing into a contiguous temporary buffer and folding with VecZnxDftAddInplace is ~2× faster.
  • Remove all associated OEP (VmpApplyDftToDftAddImpl, VmpApplyDftToDftAddTmpBytesImpl, SvpApplyDftToDftAddImpl), delegate, and bench-suite plumbing.

poulpy-cpu-ref / poulpy-cpu-avx

  • Update FFT64 and NTT120 vmp_apply_dft_to_dft implementations to accept limb_offset directly, replacing the separate _add codepath.
  • NTT120 AVX2 (arithmetic_avx.rs): add reduce_b_and_apply_crt that fuses the CRT multiply into the Barrett reduction pass, using new compile-time constants POW32_CRT and POW16_CRT; apply to compact_all_blocks to reduce instruction count by a factor of ~2x.

poulpy-core

  • Rewrite external product (glwe_external_product_internal) and GLWE keyswitching inner loops to write intermediate per-digit VMP results into a dedicated temporary buffer before accumulating with VecZnxDftAddInplace, avoiding scattered-write cache thrashing. where bounds updated accordingly.
  • Add bench_suite::keyswitch::gglwe module and keyswitch_glwe criterion benchmark targeting the NTT120 backend; remove the old FFT64-specific keyswitch_glwe_fft64 benchmark.

@Pro7ech Pro7ech added the enhancement New feature or request label Mar 3, 2026
@Pro7ech Pro7ech self-assigned this Mar 3, 2026
@Pro7ech Pro7ech merged commit 5a99010 into main Mar 12, 2026
1 check passed
@Pro7ech Pro7ech deleted the dev_impl_opt branch March 12, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant