Parallel sparse multiplication by xrvdg · Pull Request #125 · worldfnd/provekit

xrvdg · 2025-06-16T15:39:53Z

No description provided.

recmo · 2025-06-16T18:04:37Z

noir-r1cs/src/sparse_matrix.rs

        );
+
+        let intermediate_multiplication =
+            LockFreeArray::new(vec![(0, FieldElement::zero()); rhs.matrix.num_entries()]);


It's unfortunate that it requires allocating a temporary vector larger than the output.

recmo · 2025-06-16T18:05:57Z

noir-r1cs/src/sparse_matrix.rs

+        //   wouldn't know the row a value belongs to. That's why the rows drive the
+        //   iterator below.
+        // - Acquiring a mutex per column in the result was too expensive (even with
+        //   parking_lot)


What I had in mind was splitting the work over the mutable output vector, and creating a iter_col method on the matrix. (Which is substantially more complicated than iter_row, but should be doable).
Can we explore this option?

noir-r1cs/src/sparse_matrix.rs

recmo

LGTM after comments addressed.

recmo · 2025-06-27T09:01:39Z

noir-r1cs/src/sparse_matrix.rs

-            result[j] += value * self[i];
-        }
+
+        let chunk_size = result.len().div_ceil(num_threads);


If the perf delta is small (e.g. <5%) I think I prefer a solution that let's rayon decide the chunk size. I don't like hard dependencies on the number of available threads. The number of threads does not say much, as we might be doing other work in parallel.

Instead we should pick the workload size such that it amortizes the overhead, while still allowing parallelization for large problem sizes. To approximate this I like to pick 'whatever subproblem fits in L1 cache' as problem size. And this in turn is approximated with the workload_size::<F>(result.len()) function.

recmo · 2025-06-27T09:03:11Z

noir-r1cs/src/sparse_matrix.rs

+
+        let chunk_size = result.len().div_ceil(num_threads);
+
+        // In microbenchmarks par_iter_mut.chunks outperforms par_chunks_mut slightly.


If it is a small difference this might be a compiler quirk. I'd stick with par_chunks_mut for simplicity and maintainability.

If .par_iter_mut().chunks(..) is faster than this is basically a bug somewhere as the former provides the libraries/compiler with strictly more information. This bug is likely to be fixed at some point.

noir-r1cs/src/sparse_matrix.rs

recmo · 2025-06-27T09:06:59Z

noir-r1cs/src/sparse_matrix.rs

+            .enumerate()
+            .for_each(|(chunk_number, mut chunk)| {
+                let base = chunk_number * chunk_size;
+                let col_range = base..base + chunk_size;


let col_range = base..base + chunk.len().

Otherwise col_range will be out of bounds when chunk_size doesn't divide result.len(). You are protected here from this bug because col will only be in-range, but better to do it right.

let base = chunk_number * chunk_size should still be correct as only the last chunk is not exactly chunk_size (but please confirm with rayon docs, and maybe leave a comment explaining correctness.)

xrvdg added 5 commits June 11, 2025 12:18

noir: row-level parallelism in left and right multiplication

3de18cd

sparse matrix: optimise storage right multiplication

d333f8d

first attempt parallel

b4656d1

sparse matrix: lock free array

c18a743

stash

17aee00

xrvdg marked this pull request as draft June 16, 2025 15:39

recmo reviewed Jun 16, 2025

View reviewed changes

xrvdg added 3 commits June 25, 2025 15:33

Do not initialize vector

66f8dc7

first attempt at splitting result

62462ca

cleanup

72bb6ea

recmo approved these changes Jun 27, 2025

View reviewed changes

recmo changed the title ~~Sparse multiplication~~ Parallel sparse multiplication Jun 27, 2025

xrvdg mentioned this pull request Nov 3, 2025

provekit/common/src/sparse_matrix: Paralelize #158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel sparse multiplication#125

Parallel sparse multiplication#125
xrvdg wants to merge 8 commits intomainfrom
xr/sparse-mul

xrvdg commented Jun 16, 2025

Uh oh!

recmo Jun 16, 2025

Uh oh!

recmo Jun 16, 2025

Uh oh!

Uh oh!

recmo left a comment

Uh oh!

recmo Jun 27, 2025

Uh oh!

recmo Jun 27, 2025

Uh oh!

Uh oh!

recmo Jun 27, 2025

Uh oh!

recmo Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		let chunk_size = result.len().div_ceil(num_threads);

		// In microbenchmarks par_iter_mut.chunks outperforms par_chunks_mut slightly.

Conversation

xrvdg commented Jun 16, 2025

Uh oh!

recmo Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

recmo Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

recmo left a comment

Choose a reason for hiding this comment

Uh oh!

recmo Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

recmo Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

recmo Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

recmo Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants