opt: parallelize FRI fold with rayon#448
Conversation
Codex Code Review
No direct security vulnerabilities (unsafe/memory-safety/crypto correctness/VM privilege issues) stood out in the changed diff. |
|
Review: opt: parallelize FRI fold with rayon FRI Fold Parallelization is correct. Using a temporary buffer to avoid aliasing (reading evals[2*j] while writing evals[j]) is the right approach. Both sequential and parallel paths produce identical results. Issues Found: [Low] Dead code in bench script [Low] Lossy as u64 cast on nanosecond accumulator [Low] Misleading variable name r4_merkle_dur [Informational] reset_all() only clears main-thread TLS No correctness or security issues. |
|
/bench |
76dd9cc to
ddaca1a
Compare
The FRI fold loop was sequential over half the evaluation points. Parallelize with rayon par_iter for the first few layers where domain_size is large enough to benefit.
55b1a19 to
373be35
Compare
|
/bench |
Benchmark — fib_iterative_8M (median of 3)Table parallelism: 32 (auto = cores / 3)
Commit: f4fbb09 · Baseline: cached · Runner: self-hosted bench |
|
/bench |
|
/bench |
1 similar comment
|
/bench |
|
/bench 3 1 |
|
/bench 3 1 |
|
/bench |
|
Closing this — superseded by #597, which is the newer take on the same FRI fold parallelization and adds a size threshold to avoid Rayon overhead on the small final layers. #597 still needs a more thorough review (rebase onto current main, fmt fix, and a real |
Summary
Parallelize
fold_evaluations_in_placein the FRI commit phase usingrayon::par_iter. The fold loop was previously sequential overN/2extension field elements. Since the in-place fold has aliasing (evals[j]reads fromevals[2*j]), the parallel version uses a temporary buffer +clone_from_slice.Benchmark (Apple Silicon M3 Pro,
PARALLEL_TABLES=1,fib_iterative_1M)The temp buffer for the parallel version is ~36MB (N/2 extension elements) — small vs the 23GB working set.
Test plan
cargo test --release -p stark(all passed)/benchon CI runner