We noticed a ~30% performance regression in our benchmarks when going from Rust 1.90 to Rust 1.91. The benchmarks were run on a Macbook Pro M4. The regression still seems to be present in nightly (2026-02-26).
Code
I created a minimized version of our benchmark, although not minimal. Let me know how I can refine it if needed. The code is available here (link points to the right branch). To notice the regression, compare running
$ cargo bench --bench blake3_1to1_fast --features internal
between versions 1.90 and 1.91 (i.e. just changing the version in the rust-toolchain.toml).
//! file: miden-vm/benches/blake3_1to1_fast.rs
use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
use miden_core::Felt;
use miden_core_lib::CoreLibrary;
use miden_processor::{FastProcessor, advice::AdviceInputs};
use miden_vm::{Assembler, DefaultHost, StackInputs};
use tokio::runtime::Runtime;
fn blake3_1to1_fast(c: &mut Criterion) {
let mut group = c.benchmark_group("blake3_1to1_fast");
// operand_stack: 8 words of 0xFFFFFFFF
let stack_inputs =
StackInputs::new(&[Felt::new(u64::from(u32::MAX)); 8]).unwrap();
// advice_stack: 100 iterations
let advice_inputs = AdviceInputs::default().with_stack([Felt::new(100)]);
let mut assembler = Assembler::default();
assembler
.link_dynamic_library(CoreLibrary::default())
.expect("failed to load core library");
let program = assembler
.assemble_program(BLAKE3_1TO1_MASM)
.expect("Failed to compile test source.");
group.bench_function("blake3_1to1", |bench| {
bench.to_async(Runtime::new().unwrap()).iter_batched(
|| {
let host =
DefaultHost::default().with_library(&CoreLibrary::default()).unwrap();
let processor =
FastProcessor::new(stack_inputs).with_advice(advice_inputs.clone());
(host, program.clone(), processor)
},
|(mut host, program, processor)| async move {
processor.execute(&program, &mut host).await.unwrap();
},
BatchSize::SmallInput,
);
});
group.finish();
}
const BLAKE3_1TO1_MASM: &str = "\
use miden::core::crypto::hashes::blake3
use miden::core::sys
begin
# Push the number of iterations on the stack, and assess if we should loop
adv_push.1 dup neq.0
while.true
# Move loop counter down
movdn.8
# Execute blake3 hash function
exec.blake3::hash
# Decrement counter, and check if we loop again
movup.8 sub.1 dup neq.0
end
# Drop counter
drop
# Truncate stack to make constraints happy
exec.sys::truncate_stack
end
";
criterion_group!(benchmark, blake3_1to1_fast);
criterion_main!(benchmark);
On my Macbook Pro M4, Rust 1.90 yields
$ cargo bench --bench blake3_1to1_fast --features internal
program_execution_fast/blake3_1to1
time: [2.0877 ms 2.0909 ms 2.0942 ms]
while on version 1.91,
$ cargo bench --bench blake3_1to1_fast --features internal
program_execution_fast/blake3_1to1
time: [2.7507 ms 2.7549 ms 2.7594 ms]
change: [+31.472% +31.756% +32.074%] (p = 0.00 < 0.05)
Performance has regressed.
Note that the performance is similar poor on nightly 2026-02-26,
$ cargo bench --bench blake3_1to1_fast --features internal
blake3_1to1_fast/blake3_1to1
time: [2.6636 ms 2.6697 ms 2.6762 ms]
Version it worked on
It most recently worked on: Rust 1.90,
rustc --version --verbose:
rustc 1.90.0 (1159e78c4 2025-09-14)
binary: rustc
commit-hash: 1159e78c4747b02ef996e55082b704c09b970588
commit-date: 2025-09-14
host: aarch64-apple-darwin
release: 1.90.0
LLVM version: 20.1.8
Version with regression
rustc --version --verbose:
rustc 1.91.1 (ed61e7d7e 2025-11-07)
binary: rustc
commit-hash: ed61e7d7e242494fb7057f2657300d9e77bb4fcb
commit-date: 2025-11-07
host: aarch64-apple-darwin
release: 1.91.1
LLVM version: 21.1.2
We noticed a ~30% performance regression in our benchmarks when going from Rust 1.90 to Rust 1.91. The benchmarks were run on a Macbook Pro M4. The regression still seems to be present in nightly (2026-02-26).
Code
I created a minimized version of our benchmark, although not minimal. Let me know how I can refine it if needed. The code is available here (link points to the right branch). To notice the regression, compare running
between versions 1.90 and 1.91 (i.e. just changing the version in the
rust-toolchain.toml).On my Macbook Pro M4, Rust 1.90 yields
while on version 1.91,
Note that the performance is similar poor on nightly 2026-02-26,
Version it worked on
It most recently worked on: Rust 1.90,
rustc --version --verbose:Version with regression
rustc --version --verbose: