Rollup of 6 pull requests#155878
Conversation
Fuchsia's Starnix system has had a multi-year long bug where occasionally a typo could cause the rust compiler to take 10+ hours to report an error. This was particularly hard to trace down since Starnix's codebase is massive, over 384 thousand lines as of writing. With the help of treereduce, cargo-minimize, and rustmerge, after about a month of running we reduced it down to a couple [lines of code], which only takes about 35 seconds to report an error on my machine. The bug also appears to happen with `-Z next-solver=no` and `-Z next-solver=coherence`, but does not occur with `-Z next-solver` or `-Z next-solver=globally`. I used Gemini to help diagnose the problem and proposed solution (which is the one proposed in this patch): 1. The trait solver gets stuck in an exponential loop evaluating auto-trait bounds (like Send and Sync) on cyclic types that contain compilation errors (TyKind::Error). 2. Normally, if the solver detects a cycle, it prevents the result from being stored in the Global Cache because the result depends on the current evaluation stack. However, when an error is involved, the depth tracking gets pinned to a low value, forcing the solver to rely on the short-lived Provisional Cache. Since the provisional cache is cleared between high-level iterations of the fulfillment loop, the solver ends up re-discovering and re-evaluating the same large cycle thousands of times. 3. Allow global caching of results even if they appear stack-dependent, provided that the inference context is already "tainted by errors" (`self.infcx.tainted_by_errors().is_some()`). This violates the strict invariant that global cache entries shouldn't depend on the stack, but it is safe because the compilation is already guaranteed to fail due to the presence of errors. Prioritizing compiler responsiveness and termination over perfect correctness in error states is the correct trade-off here. I added the reduction as the test case for this. However, I don't see an easy way to catch if this bug comes back. Should we add some way to timeout the test if it takes longer than 10 seconds to compile? That could be a source of flakes though. I don't have any experience with the trait solver code, but I did try to review the code to the best of my ability. This approach seems a bit of a bandaid to the solution, but I don't see a better solution. We could try to teach the solver to not clear the provisional cache in this circumstance, but I suspect that'd be a pretty invasive change. I'm guessing if this does cause problems, it might report an incorrect error, but I (and Gemini) were unable to come up with an example that reported a different error with and without this fix. [lines of code]: https://gist.github.com/erickt/255bc4006292cac88de906bd6bd9220a
instead of duplicating in every diagnostic
Rust packs rlib metadata into a lib.rmeta archive member encoded as a Mach-O object. For Apple arm64e, extend the existing metadata-object subtype special case from bare CPU_SUBTYPE_ARM64E to CPU_SUBTYPE_ARM64E | CPU_SUBTYPE_PTRAUTH_ABI.
triagebot: Mention 'subtree' in clippy message rust-lang#154396 (comment) This makes it more obvious a subtree is being changed, similarly to Miri and `rustc-dev-guide`
Fix pathological performance in trait solver cycles with errors Fuchsia's Starnix system has had a multi-year long bug where occasionally a typo could cause the rust compiler to take 10+ hours to report an error (see rust-lang#136516 and rust-lang#150907). This was particularly hard to trace down since Starnix's codebase is massive, over 384 thousand lines as of writing. With the help of treereduce, cargo-minimize, and rustmerge, after about a month of running we reduced it down to a couple [lines of code], which only takes about 35 seconds to report an error on my machine. The bug also appears to happen with `-Z next-solver=no` and `-Z next-solver=coherence`, but does not occur with `-Z next-solver` or `-Z next-solver=globally`. I used Gemini to help diagnose the problem and proposed solution (which is the one proposed in this patch): 1. The trait solver gets stuck in an exponential loop evaluating auto-trait bounds (like Send and Sync) on cyclic types that contain compilation errors (TyKind::Error). 2. Normally, if the solver detects a cycle, it prevents the result from being stored in the Global Cache because the result depends on the current evaluation stack. However, when an error is involved, the depth tracking gets pinned to a low value, forcing the solver to rely on the short-lived Provisional Cache. Since the provisional cache is cleared between high-level iterations of the fulfillment loop, the solver ends up re-discovering and re-evaluating the same large cycle thousands of times. 3. Allow global caching of results even if they appear stack-dependent, provided that the inference context is already "tainted by errors" (`self.infcx.tainted_by_errors().is_some()`). This violates the strict invariant that global cache entries shouldn't depend on the stack, but it is safe because the compilation is already guaranteed to fail due to the presence of errors. Prioritizing compiler responsiveness and termination over perfect correctness in error states is the correct trade-off here. I added the reduction as the test case for this. However, I don't see an easy way to catch if this bug comes back. Should we add some way to timeout the test if it takes longer than 10 seconds to compile? That could be a source of flakes though. I don't have any experience with the trait solver code, but I did try to review the code to the best of my ability. This approach seems a bit of a bandaid to the solution, but I don't see a better solution. We could try to teach the solver to not clear the provisional cache in this circumstance, but I suspect that'd be a pretty invasive change. I'm guessing if this does cause problems, it might report an incorrect error, but I (and Gemini) were unable to come up with an example that reported a different error with and without this fix. Resolves rust-lang#150907 [lines of code]: https://gist.github.com/erickt/255bc4006292cac88de906bd6bd9220a
…4e-ptrauth-core-diagnostics-2026-04-24-u9836b06, r=madsmtm arm64e: set ptrauth ABI subtype on lib.rmeta Mach-O objects Set `CPU_SUBTYPE_PTRAUTH_ABI` (as well as the existing `CPU_SUBTYPE_ARM64E`) on ARM64e object files that `rustc` creates, to match Clang/LLVM-generated ARM64e objects. This corresponds to `cpusubtype == 0x80000002`. Before this change, rustc emitted the bare `CPU_SUBTYPE_ARM64E` subtype for the metadata wrapper objects / `symbols.o` file, producing `0x00000002`, which can be reported by Apple's linker as `arm64e.old`. Fixes [rust-lang#130085](rust-lang#130085). Fixes [rust-lang#143844](rust-lang#143844). Fixes [rust-lang#150046](rust-lang#150046). Fixes [rust-lang#139218](rust-lang#139218).
…uillaumeGomez Fix panic for doc attributes on where predicates Fixes rust-lang#155804 r? @GuillaumeGomez
add test for accidentally fixed `binius_field` issue cc rust-lang#153614 (comment) r? @lqd
Render `ConstContext` for diagnostics once Probably just a left-over from the `ftl` file removal
|
@bors r+ rollup=never p=5 |
This comment has been minimized.
This comment has been minimized.
|
📌 Perf builds for each rolled up PR:
previous master: 2f43fe4303 In the case of a perf regression, run the following command for each PR you suspect might be the cause: |
|
A job failed! Check out the build log: (web) (plain enhanced) (plain) Click to see the possible cause of the failure (guessed by this bot) |
|
Finished benchmarking commit (345a975): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis perf run didn't have relevant results for this metric. Max RSS (memory usage)This perf run didn't have relevant results for this metric. CyclesResults (secondary -2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 487.764s -> 489.767s (0.41%) |
Successful merges:
binius_fieldissue #155865 (add test for accidentally fixedbinius_fieldissue)ConstContextfor diagnostics once #155866 (RenderConstContextfor diagnostics once)r? @ghost
Create a similar rollup