Conversation
messense
left a comment
There was a problem hiding this comment.
Thanks for the detailed write-up! Unfortunately I believe this approach introduces a regression for the bug fixed in #2969.
The problem: shared inodes during repair
stage_artifact renames the cargo output into target/maturin/, then creates a copy-back at the original cargo output path. With a hard link, both paths share the same inode. The auditwheel repair and in-tree patchers then mutate the staged artifact in-place, which means those mutations also corrupt the cargo output path — exactly the bug #2969 fixed.
Specifically:
- Linux:
patchelfedits files in-place (truncate + write), not via temp-file + rename. - macOS:
patch_machousesfs_err::write(file, &data)— also an in-place overwrite. - Windows:
pe_patch::replace_neededusesfs::write(file_path, data)— also in-place.
The table in the PR description claims patchelf uses "temp file + rename over staged", but that is not how patchelf works in the invocations maturin uses. All three platforms mutate via truncate-and-write on the same path, so the original cargo output (now a hard link to the same inode) sees the patched content too.
Interaction with #2969
PR #2969 originally introduced copy_artifact_for_repair for bin artifacts to prevent auditwheel/patchelf from corrupting cargo outputs. That logic was later refactored into the current stage_artifact, which now handles both cdylib and bin artifacts. Replacing the copy-back with a hard link would undermine the isolation for all artifact types, not just bins.
On a subsequent build where cargo skips recompilation, it would find already-patched DT_NEEDED entries / modified Mach-O load commands, causing the failures described in #2680.
What would work instead
The existing reflink_or_copy is the correct approach — it creates a separate inode (via CoW or full copy) so the staged artifact can be safely mutated without affecting the cargo output.
If hard-link performance is desired, it would only be safe to hard-link after all repair/patching steps are complete, but that requires more complex control-flow changes that aren't worth the complexity.
The editable-install hard link (binding_generator/mod.rs) has the same concern: the source is the staged artifact that may be modified by auditwheel repair afterward.
|
For the original issue (#3111), the artifact only needs to be staged (renamed away) for isolation during repair — the copy-back to the cargo output path is purely a convenience so users can find the artifact at the expected location. For projects like Polars that have no external shared lib deps, A simpler and universally fast fix: skip the copy-back in
This eliminates the expensive copy for the common case on all filesystems, not just ext4, while preserving the isolation invariant that #2969 established. |
|
Thanks @messense — you're right on both counts. The modifier-semantics table I posted was wrong: Pushed three new commits implementing your suggested approach:
Three new unit tests cover finalize: same-device rename, overwrite-existing-destination, and source-missing error. Local gates green:
The two CI failures on the previous push were both pre-existing on
The branch name |
85b3275 to
e459bbb
Compare
messense
left a comment
There was a problem hiding this comment.
Thanks for the quick rework — the deferred rename-back approach is the right direction! A few issues:
1. patch_editable returns true even when nothing was modified
The editable path unconditionally returns Ok(true) after calling patch_editable, but:
- The default
patch_editabletrait impl is a no-op (non-Linux platforms) - On Linux, it skips artifacts with empty
linked_paths
So for a macOS/Windows editable build, or a Linux editable build with no linked_paths, the artifact is never modified but was_patched = true prevents the rename-back. This is safe (conservative) but unnecessarily forces cargo to recompile on the next build. Consider having patch_editable return bool indicating whether it actually modified anything, or at minimum rename was_patched to may_have_been_patched so the intent is clear.
2. Error path leaves cargo output missing
If the build errors after stage_artifact but before finalize_staged_artifacts (e.g. auditwheel() fails, writer.finish() fails), the cargo output path stays absent because the original was renamed/removed. The wheel is the deliverable so this isn't a correctness bug, but it's a UX regression from the current behavior where the copy-back happens immediately. Worth documenting, or consider an RAII guard that restores on drop.
3. Cross-device finalize works but is a double copy
In the cross-device case, stage_artifact does reflink_or_copy + remove_file, then finalize_staged_artifact does another reflink_or_copy + remove_file back. That's two full copies instead of the current one. Might be worth a comment acknowledging this is the expected tradeoff (cross-device is rare).
4. Minor: finalize_staged_artifact is pub(crate) but only used in build_orchestrator
The finalize_staged_artifacts wrapper in build_orchestrator.rs is the real entry point. Could keep finalize_staged_artifact private to the repair module and only export the wrapper, but this is cosmetic.
- patch_editable now returns whether it modified the artifact, so the cargo-output restore happens for editable builds on macOS/Windows and on Linux when linked_paths is empty (was unconditionally skipped). - document the cross-device double-copy tradeoff and the build-error path where the cargo output stays absent until the next compile. - make finalize_staged_artifact private to the repair module; expose only the finalize_staged_artifacts wrapper from build_context. - drop manual Changelog.md entry — cargo-cliff regenerates it on release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review @messense! Pushed 1.
|
- patch_editable now returns whether it modified the artifact, so the cargo-output restore happens for editable builds on macOS/Windows and on Linux when linked_paths is empty (was unconditionally skipped). - document the cross-device double-copy tradeoff and the build-error path where the cargo output stays absent until the next compile. - make finalize_staged_artifact private to the repair module; expose only the finalize_staged_artifacts wrapper from build_context. - drop manual Changelog.md entry — cargo-cliff regenerates it on release.
c130029 to
a5ccde9
Compare
There was a problem hiding this comment.
Pull request overview
This PR adjusts maturin’s artifact staging/restoration flow to reduce expensive post-compile file operations by deferring restoration of the cargo output artifact until after the wheel is successfully written, and by tracking whether auditwheel/editable patching occurred.
Changes:
- Added
cargo_output_pathtoBuildArtifactso the original cargo output location can be restored later. - Changed
BuildContext::add_external_libs/WheelRepairer::patch_editableto return whether any in-place patching occurred, and wired that through the build orchestrator. - Added
finalize_staged_artifacts/finalize_staged_artifactto restore staged artifacts back to the cargo output path when safe, plus unit tests for finalize.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/compile.rs |
Extends BuildArtifact with cargo_output_path to support deferred restoration. |
src/build_orchestrator.rs |
Captures a was_patched flag and calls finalize_staged_artifacts after writer.finish(). |
src/build_context/repair.rs |
Changes add_external_libs to return patch status; updates stage_artifact to record original path and adds finalize helpers + tests. |
src/build_context/mod.rs |
Re-exports finalize_staged_artifacts for orchestrator use. |
src/auditwheel/repair.rs |
Updates patch_editable API to return bool indicating in-place modifications. |
src/auditwheel/linux.rs |
Implements the new patch_editable -> Result<bool> behavior and tracks whether patching was attempted. |
Avoids a full copy on ext4 and other filesystems without reflink support. Adds link_or_copy helper (hard_link -> reflink -> copy) next to the existing reflink_or_copy and uses it for the stage_artifact copy-back. Refs PyO3#3111
Replaces the full fs::copy used to install the compiled extension into the Python source tree with link_or_copy. The existing remove_file call right above it guarantees the destination is absent so the hard link path always engages. Refs PyO3#3111
Reverts 6c862c4, 4069caf, and e9f7c48. Hard-linking aliases inodes between the staged artifact and the cargo output path. patchelf, patch_macho, and pe_patch all mutate the staged file in place (truncate-and-write), so the cargo output path sees the patched bytes too. This re-introduces the bug fixed by PyO3#2969 / PyO3#2680. Reverting cleanly so the follow-up commit can implement the alternative proposed by @messense: defer the copy-back and only restore the cargo output path via rename when no patching was needed.
Removes the unconditional copy-back from stage_artifact and replaces it with a post-wheel-write rename when no auditwheel patching has occurred. The cargo output path is now restored via fs::rename — O(1) on every mainstream filesystem (ext4, xfs, btrfs, zfs, ntfs, refs, apfs, hfs+) — instead of a full byte copy, eliminating the ~8s post-compile cost reported on multi-GiB Polars-class artifacts. In the patching branch (rare — only projects that bundle external shared libs, e.g. lib_with_disallowed_lib), the cargo output path is left empty rather than copied back. Cargo recompiles on the next invocation, which it had to do anyway because the patched artifact's DT_NEEDED / Mach-O load commands / PE imports no longer reflect cargo's view. Strict improvement over today's stale-unpatched-copy state, which can confuse cargo's incremental cache. add_external_libs now returns Result<bool> so the orchestrator can decide whether to finalize. BuildArtifact carries cargo_output_path through compile_cdylib / build_bin_wheel / AuditedArtifact so the finalize step knows where to restore each artifact. Closes PyO3#3111.
- patch_editable now returns whether it modified the artifact, so the cargo-output restore happens for editable builds on macOS/Windows and on Linux when linked_paths is empty (was unconditionally skipped). - document the cross-device double-copy tradeoff and the build-error path where the cargo output stays absent until the next compile. - make finalize_staged_artifact private to the repair module; expose only the finalize_staged_artifacts wrapper from build_context. - drop manual Changelog.md entry — cargo-cliff regenerates it on release.
patch_editable returns the indices it actually modified (set_rpath Ok only). add_external_libs clears cargo_output_path on patched artifacts. finalize_staged_artifacts iterates and restores any that still have it, so a mixed editable build no longer skips restore for unpatched artifacts.
Extract stage_file from BuildContext::stage_artifact so the move-to-staging primitive is testable without a BuildContext, and add three tests covering the move, the Windows stale-dest path, and the stage→finalize roundtrip.
This reverts commit 9af7346.
8e1cb6e to
73b84b4
Compare
…#3159) `fs::rename` in `finalize_staged_artifacts` to skip the full byte copy on ext4. The trade-off was that whenever auditwheel / delocate / delvewheel actually rewrote bytes (DT_NEEDED, Mach-O load commands, PE imports, RPATH), the cargo output path was left empty (same-fs) or holding the unpatched original (cross-device), and the staged copy was never restored — see #2969 for why patched bytes must not flow back to cargo's incremental cache. Tools and CI scripts that read `target/<target>/release/<bin>` after a patched build saw a missing or stale file. Restore the invariant that the cargo output path always holds the clean unpatched bytes after a successful build, while keeping the rename-back fast-path for the unpatched case. The repair pipeline now splits into a detect step and a patch step: - New `PatchKind<'a>` enum carries the per-call context (`Repair { grafted, libs_dir, artifact_dir }` or `Editable`) and is used by both prediction and execution. - New `WheelRepairer::patch_required(audited, kind) -> Vec<bool>` predicts, before any bytes are rewritten, which audited artifacts the upcoming `patch` call will modify in place. Conservative defaults: all-true on `Repair`, all-false on `Editable`. `ElfRepairer` overrides the editable arm to track `linked_paths.is_empty()` per-artifact so multi-bin editable builds where only some bins carry cargo-target search paths still get the cheap rename-back for the rest. - `WheelRepairer::patch_editable` is folded into `patch` via `PatchKind::Editable`, removing the parallel API surface. - New `copy_back_cargo_outputs` helper in `build_context/repair.rs` reflinks/copies the still-clean staged artifact back to its cargo output path for every `will_patch[i] == true` entry, then `take()`s `cargo_output_path` so finalize won't overwrite the unpatched bytes with the patched ones. Cross-device staging is a no-op (the original is already at the cargo path); same-filesystem staging pays one reflink per genuinely-patched artifact (O(1) on apfs/btrfs/xfs/refs, a real copy on ext4) — same cost as the pre-#3155 worst case, but scoped only to artifacts that actually need patching. - `add_external_libs` calls `patch_required` then `copy_back_cargo_outputs` then `patch` on both the editable and non-editable branches. The unconditional `cargo_output_path = None` clearing loop is gone — clearing is now per-artifact and happens in the helper. Net effect: - Unpatched artifacts: identical to today — single rename-back in `finalize_staged_artifacts`. - Patched artifacts (same-fs): one reflink/copy at patch time instead of the rename-back; cargo output ends up with unpatched bytes. - Patched artifacts (cross-device): zero extra I/O — `stage_file` already left the unpatched original at cargo output. Public auditwheel types (`AuditedArtifact`, `GraftedLib`, `AuditResult`, `PatchKind`) now derive `Debug` for consistency with the rest of the public surface. The follow-up commit tightens each platform's `patch` impl to skip artifacts whose `external_libs` is empty (ELF: skip `set_rpath`; Mach-O: filter `install_name_changes` per-artifact and skip `patch_macho` + `ad_hoc_sign` when the filter is empty; PE: filter `replacements` per-artifact case-insensitively against `aa.external_libs[*].name` and skip `pe_patch::replace_needed` when empty), and loosens the default `patch_required` for `Repair` to `!aa.external_libs.is_empty()`. The two changes ship together — tightening only the default would regress #2969, while tightening only the impls would just be dead-code elimination with `copy_back_cargo_outputs` still paying an unnecessary reflink for the now-unaffected artifacts. ### Known limitation: dlopen-only dependencies `external_libs` is populated by static dependency analysis (lddtree on the artifact's LC_LOAD_DYLIB / DT_NEEDED / PE import table). A library reached only via runtime loading — `dlopen("libfoo.so", ...)`, `dlsym`, `LoadLibrary` — won't appear in `external_libs`, so its host artifact gets `will_patch = false` and is skipped by the tightened `patch` impls. For Mach-O install-name rewriting and PE import-table rewriting this is benign: a `dlopen` doesn't go through the load command / import table, so rewriting them was a no-op anyway. ELF DT_NEEDED rewriting was already gated on `aa.external_libs` pre-PR for the same reason. The new behavior in this PR widens the gating to **ELF RPATH append**: pre-PR, every audited artifact got `$ORIGIN/<dist>.libs` appended unconditionally; post-PR, only those with non-empty `external_libs` do. Practically: a Linux extension that statically depends on nothing but `dlopen`s a library bundled by some other artifact in the same wheel will now lack the RPATH and fail at runtime. Workaround: declare a static dependency on at least one bundled lib (so the artifact gets the auto-RPATH), or set the RPATH manually via `cargo:rustc-link-arg=-Wl,-rpath,$ORIGIN/<dist>.libs`.
Closes #3111.
What
Replace the synchronous
reflink_or_copythat ran after every staging step with a deferred rename that happens once the wheel has been written and only when the staged artifact bytes still match the cargo output. On the reporter's setup (Polars + debug info on ext4) the previous flow was ~8 s of post-compile copying; the new flow is onefs::rename(O(1) on every mainstream filesystem) when no auditwheel patching occurred, and is a no-op otherwise.How
stage_artifact(src/build_context/repair.rs) renames the cargo output intotarget/maturin/and now records the original path on theBuildArtifactinstead of materialising a copy back at the cargo path. Cross-device falls back toreflink_or_copy+remove_fileas before.finalize_staged_artifactsruns afterwriter.finish()and renames each staged file back to its recorded cargo output path — but only for artifacts whose bytes were not modified in place. The decision is per-artifact:add_external_libsclearscargo_output_path: Option<PathBuf>on any artifact that auditwheel / patchelf / patch_macho / pe_patch touches; finalize iterates and only restores those that still have the marker.WheelRepairer::patch_editablereturns the indices of artifacts it actually modified (a successfulpatchelf::set_rpathis the threshold; failures don't count, since patchelf writes via temp-file + rename and an error almost always means the original is untouched). The default trait impl is no-op so non-Linux editable builds, and Linux editable builds wherelinked_pathsis empty, get the cargo-output rename for free.Why
The previous reflink-based copy-back was O(1) on btrfs / xfs-with-reflink / apfs but degraded to a full byte copy on ext4, the most common Linux filesystem. The deferred approach side-steps the copy entirely on the happy path, leaving cargo's output exactly where cargo expects it without any extra I/O.
Tradeoffs
Documented inline on
stage_artifact's rustdoc:reflink_or_copy + remove_filehere and again in finalize, two full copies vs one. Only matters when the cargo target dir andtarget/maturin/are on different filesystems — uncommon.stage_artifactandfinalize_staged_artifactsfails, the cargo output stays absent until the next compile. The wheel is the deliverable so this is not a correctness bug, and the staged artifact stays recoverable fromtarget/maturin/.Tests
finalize_renames_atomically_on_same_devicefinalize_overwrites_existing_destinationfinalize_errors_when_source_missingstage_file_removes_sourcestage_file_overwrites_stale_destinationstage_then_finalize_roundtrips_contentsChecklist
cc @messense — extends the reflink fast-path you added in #3054 to ext4 by deferring the copy-back step entirely.