Skip to content

ParallelFrontend scenario added for -Zthreads=8 run, check only#2421

Open
azhogin wants to merge 1 commit intorust-lang:masterfrom
azhogin:azhogin/par-front-scenario
Open

ParallelFrontend scenario added for -Zthreads=8 run, check only#2421
azhogin wants to merge 1 commit intorust-lang:masterfrom
azhogin:azhogin/par-front-scenario

Conversation

@azhogin
Copy link
Copy Markdown

@azhogin azhogin commented Mar 23, 2026

Adds Scenario::ParallelFrontend for -Zthreads=8 compilation, check profile only.

@azhogin azhogin force-pushed the azhogin/par-front-scenario branch from 00b1a2c to 8b6b683 Compare March 23, 2026 07:07
@nnethercote
Copy link
Copy Markdown
Contributor

The problem with this is that parallelism is an entirely separate dimension: parallel Full, parallel IncrFull, parallel IncrUnchanged, parallel IncrPatched are all possible. And it's not even obvious from the name ParallelFrontend which of those scenarios is the one being adapted.

@azhogin
Copy link
Copy Markdown
Author

azhogin commented Mar 23, 2026

The problem with this is that parallelism is an entirely separate dimension: parallel Full, parallel IncrFull, parallel IncrUnchanged, parallel IncrPatched are all possible. And it's not even obvious from the name ParallelFrontend which of those scenarios is the one being adapted.

Yes, the idea was to add only "Full/check" at first to estimate the increased execution costs and maybe implement additional modes later. Should I rename ParallelFrontend into ParallelFull?

@nnethercote
Copy link
Copy Markdown
Contributor

ParallelFull (or ParFull) would be clearer. But again, given that it's really a separate dimension, shoehorning it in as a scenario may cause problems down the road.

@Kobzol, do you have any thoughts/plans about this?

@Kobzol
Copy link
Copy Markdown
Member

Kobzol commented Mar 23, 2026

I definitely have thoughts, but I'm sick now and don't have the energy to comment. I'll try to write something by the end of this week.

@petrochenkov
Copy link
Copy Markdown

I wish @azhogin wrote more about the motivation and goals when posting this.
The goals here are

  • Get @azhogin familiar with rustc-perf codebase so he could add more scenarios like this later, or potentially integrate use of tools like https://github.com/Zoxc/rcb to rustc-perf
  • Have something working to start talking with the infra team about resources available for benchmarking parallel frontend.

Having some of the check benchmarks running with parallel frontend will have the lowest cost/benefit ratio

  • One one hand we don't spend too much resource because the check runs are the cheapest ones
  • One another hand the parallel frontend doesn't affect LLVM, so measuring only the check part will show largest changes if something related to parallel frontend changes
  • Once some of the benchmarks is running we can look at the results and see how much the time variates, how the other metrics behave, and think whether we need something like rcb integrated instead of cloning the regular single-threaded setup.

@lqd
Copy link
Copy Markdown
Member

lqd commented Mar 23, 2026

Just as a note: we have I think at least one benchmark running with the parallel frontend (but 4 threads).

This setup doesn’t look like it should be a scenario to me either, but let’s wait for jakub’s thoughts once he feels better.

@Kobzol
Copy link
Copy Markdown
Member

Kobzol commented Mar 24, 2026

So, first, let me say that I agree that we should be preparing for benchmarking the parallel frontend in a better way, and I'm happy to help with that, both on the rustc-perf and the infra side!

Regarding the implementation, we should not be adding a new scenario. As Nick said, -Zthreads is an entirely new dimension, that works essentially with all existing scenarios and profiles. I'm sure that in the future we will want to benchmark most of them with the parallel frontend, and we will also want to benchmark different thread counts, which wouldn't be possible with this being a scenario.

That being said, we could do something temporary to benchmark only a few situations with the parallel frontend. And we actually already did that last year 😅 We have the serde benchmark running with 4 threads, across all scenarios and profiles. In terms of variance, it does not seem egregiously higher than the base variant (parallel vs serial), at least for the serde benchmark.

So I guess that if we want to extend this, we have the several options:

  1. Add more hardcoded parallel benchmarks for other crates, or more thread counts for serde. This is easy to do and doesn't require almost any code changes, but it's not very "scalable", of course.
  2. Add the thread count as another axis for compilation benchmarks, and either select a few benchmarks for which we will use higher values than 1, or enable e.g. 4 threads for all check profiles, or something like that. I don't think that we can afford enabling 2/4/8 for everything though, that would be too much work to benchmark, even with the two benchmarking machines that we have today.
  3. Go all-in and just enable -Zthreads=<value that will be used on stable> for all benchmarks, and treat the parallel frontend as the default mode. We might want to go this way eventually, but I'm not sure if the parallel frontend is ready for this today.

So I think that 2) is the way forward. It's not completely trivial to add a new axis to the compilation benchmarks, but it's also not super hard, it "just" needs some boilerplate code that propagates the new config option throughout the collector, the database, the website backend, and then also the frontend. We have a bunch of PRs from last year that added the target compilation benchmark axis, so it would be mostly a copy-paste of that.

One additional thing to note is that we haven't yet figured out the best metric to check here, I think. I'm not sure if cycles really work, as they will grow with the number of threads. That being said, we mostly use rustc-perf to compare historical data for the same configuration, rather than to compare across configurations, so it doesn't matter that much. But still it would be good go have a metric that can show small changes in parallel code.

Regarding

Have something working to start talking with the infra team about resources available for benchmarking parallel frontend.

Happy to talk about that in the #t-infra stream on Zulip. I think that we have enough capacity now to add more benchmarks, so that shouldn't be a big issue.

@petrochenkov
Copy link
Copy Markdown

or enable e.g. 4 threads for all check profiles

enabling 2/4/8 for everything though, that would be too much work to benchmark

That's what this PR attempts to achieve, covering all the benchmarks, but only for check profiles, because using all the profiles would be too much work.

The number of threads depends on the number of cores on the benchmarking machines, if they have 8+ cores then I'd prefer to use -Zthreads=8 because that's considered the best-performing setup at the moment (even for machines with larger numbers of cores), otherwise we can use -Zthreads=4.

Go all-in and just enable -Zthreads= for all benchmarks, and treat the parallel frontend as the default mode. We might want to go this way eventually

I don't think we should go this way eventually, it will ruin icount-based benchmarking which is used as a source of truth for any changes not involving redistributing work between threads.

One additional thing to note is that we haven't yet figured out the best metric to check here, I think. I'm not sure if cycles really work, as they will grow with the number of threads.

Only wall time I guess.
Perhaps icounts/cycles too, as a secondary metric but in the opposite direction - core utilization improves (some work may need to be duplicated to achieve that) -> icounts grow. So if icounts don't change in single-threaded setup, but increase in multi-threaded setup, it may be a good sign.

@petrochenkov
Copy link
Copy Markdown

So I think that 2) is the way forward. It's not completely trivial to add a new axis to the compilation benchmarks, but it's also not super hard, it "just" needs some boilerplate code that propagates the new config option throughout the collector, the database, the website backend, and then also the frontend. We have a bunch of PRs from last year that added the target compilation benchmark axis, so it would be mostly a copy-paste of that.

@azhogin Could you implement this ^^^ suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants