Skip to content

scripts: serialize PyPy/Rust install and verify toolchain is usable#13828

Open
avi-starkware wants to merge 3 commits intomain-v0.14.2from
avi/privacy/fix-install-build-tools-toolchain
Open

scripts: serialize PyPy/Rust install and verify toolchain is usable#13828
avi-starkware wants to merge 3 commits intomain-v0.14.2from
avi/privacy/fix-install-build-tools-toolchain

Conversation

@avi-starkware
Copy link
Copy Markdown
Collaborator

@avi-starkware avi-starkware commented Apr 20, 2026

Summary

Fixes a flaky Docker build failure of the form:

#14 60.73 [install_build_tools] Installing sccache 0.14.0...
#14 60.92 error: 'cargo' is not installed for the toolchain '1.92-x86_64-unknown-linux-gnu'.
#14 ERROR: process "/bin/sh -c scripts/install_build_tools.sh" did not complete successfully: exit code: 1

What we observed

  • 'cargo' is not installed for the toolchain '1.9x-…' on several runs (PR scripts: install project Rust toolchain before cargo tools #13795's reproducer, remove-mock-forest-storage's sequencer docker build, and the most recent report on einat/proof-flow-1-state-reader-fixes).
  • rustup panicking mid-install with thread 'CloseHandle' panicked at src/diskio/mod.rs:…: called Result::unwrap() on an Err value: RecvError on two separate runs (one on this branch, one on main's native-blockifier-artifacts-push). In both cases, buildx then hung for ~50 minutes until its gRPC keepalive timed out.

Both observations share a root pattern: rustup is asked to install the rust-toolchain.toml-pinned channel, the install is interrupted mid-flight, and rustup is left with a stub entry for the channel (marked "installed" / "active") while its components (rustc, cargo, …) never finished downloading. On the next invocation, cargo install … fails with "cargo is not installed for the toolchain".

I don't have a single log that shows the panic directly causing a subsequent "cargo not installed" error — the buildx hang usually masks what would have surfaced — but the panic is at least one plausible mid-install failure mode, and any mid-install exit (panic, OOM, timeout, etc.) produces the same partial-install state.

Fixes

  1. Serialize install_pypy and install_rust. Concurrent disk I/O from a sibling installer is a known trigger for rustup's multi-threaded downloader panic (rust-lang/rustup#2926). Serial execution removes that trigger. The extra wall-time cost is ~10s per fresh base-image build (PyPy's runtime), and the step is Docker-cached most of the time.
  2. Probe the pinned toolchain after install; force-reinstall on failure. This is the real correctness guarantee — it catches any partial-install state regardless of what caused it. cargo --version in the project dir (where rust-toolchain.toml is the active override) is a ~10ms no-op on a healthy install, so devs who already have the toolchain fully installed pay nothing. --force is only invoked when the check actually fails.

Why not unconditional rustup toolchain install --force?

It works in Docker (layer cache absorbs the cost) but is disruptive on a developer machine: --force re-downloads ~200 MB even when the toolchain is perfectly fine. The conditional probe avoids that.

Test plan

  • CI (Docker builds in particular) passes on this branch
  • On a dev machine with 1.92 already installed, the verification step short-circuits (no unnecessary reinstall in logs)

Two related fixes for flaky docker builds:

1. Serialize install_pypy and install_rust. rustup-init's multi-threaded
   component downloader has a known race where a worker thread panics
   ("thread 'CloseHandle' panicked at src/diskio/mod.rs: RecvError").
   The race is triggered or aggravated by concurrent I/O from parallel
   installers; when it fires, rustup leaves a partial toolchain stub
   behind.

2. After installation, probe the active toolchain with `cargo --version`
   in the project dir (where rust-toolchain.toml pins the channel). On
   a healthy install this is a ~10ms no-op. If it fails — meaning
   rustup created a stub for the pinned channel but never finished
   populating its components — retry with `rustup toolchain install
   --force` to actually download rustc/cargo/etc.

Without these fixes, install_cargo_tools.sh later runs `cargo install
sccache`, cargo proxies to the pinned (partial) toolchain, and the
build fails with "'cargo' is not installed for the toolchain '1.92-…'".
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 20, 2026

PR Summary

Low Risk
Changes are limited to build tooling scripts and primarily add defensive checks; main risk is longer install time or unexpected --force reinstall if the probe misfires.

Overview
Stops running PyPy and Rust setup in parallel in scripts/install_build_tools.sh, switching to serial installation to avoid flaky rustup partial installs.

Adds a post-install probe (cargo --version in the repo root) and conditionally forces rustup toolchain install --force when the project-pinned toolchain is missing components, then logs the resolved rustc version before proceeding.

Reviewed by Cursor Bugbot for commit 416eff2. Bugbot is set up for automated code reviews on this repo. Configure here.

@reviewable-StarkWare
Copy link
Copy Markdown

This change is Reviewable

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dorimedini-starkware reviewed 1 file and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on avi-starkware, idan-starkware, and yoavGrs).


scripts/install_build_tools.sh line 106 at r1 (raw file):

log_step "install_build_tools" "Installing PyPy and Rust..."
# Intentionally serial: rustup-init's multi-threaded component downloader has a
# known race (`thread 'CloseHandle' panicked at src/diskio/mod.rs: RecvError`)

is there some github issue you can link to here?

Code quote:

known race

scripts/install_build_tools.sh line 112 at r1 (raw file):

install_pypy
install_rust
log_step "install_build_tools" "PyPy and Rust installed"

Suggestion:

log_step "install_build_tools" "Installing PyPy..."
# Intentionally serial: rustup-init's multi-threaded component downloader has a
# known race (`thread 'CloseHandle' panicked at src/diskio/mod.rs: RecvError`)
# that's triggered or aggravated by concurrent I/O from parallel installers.
# When the race fires, rustup leaves a partial toolchain stub behind and later
# `cargo install` calls fail with "'cargo' is not installed for the toolchain".
install_pypy
log_step "install_build_tools" "PyPy installed. Installing Rust..."
install_rust
log_step "install_build_tools" "Rust installed."

- Link rust-lang/rustup#2926 for the CloseHandle/RecvError panic
- Split the PyPy and Rust log_step messages so each installer's progress
  is visible (per dorimedini-starkware's suggestion)
rust-lang/rustup#2926 is one known trigger for a mid-install rustup
failure, not proven to be the mechanism behind every partial-install
occurrence we've seen. Reword the block comment to describe the pattern
(any mid-install failure can leave a registered stub without components)
rather than claiming a specific causal chain, and note that the verify +
force-reinstall step is what actually guarantees correctness.
Copy link
Copy Markdown
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dorimedini-starkware reviewed 1 file and all commit messages, and resolved 2 discussions.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on idan-starkware and yoavGrs).

Copy link
Copy Markdown
Collaborator Author

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avi-starkware made 2 comments.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on idan-starkware and yoavGrs).


scripts/install_build_tools.sh line 106 at r1 (raw file):

Previously, dorimedini-starkware wrote…

is there some github issue you can link to here?

Done.


scripts/install_build_tools.sh line 112 at r1 (raw file):

install_pypy
install_rust
log_step "install_build_tools" "PyPy and Rust installed"

Done.

Copy link
Copy Markdown
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on idan-starkware and yoavGrs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants