Skip to content

perf: defer stage_artifact copy-back, finalize via rename when unpatched#3155

Merged
messense merged 11 commits intoPyO3:mainfrom
pratyush618:fix/stage-artifact-ext4-hardlink
Apr 26, 2026
Merged

perf: defer stage_artifact copy-back, finalize via rename when unpatched#3155
messense merged 11 commits intoPyO3:mainfrom
pratyush618:fix/stage-artifact-ext4-hardlink

Conversation

@pratyush618
Copy link
Copy Markdown
Contributor

@pratyush618 pratyush618 commented Apr 24, 2026

Closes #3111.

What

Replace the synchronous reflink_or_copy that ran after every staging step with a deferred rename that happens once the wheel has been written and only when the staged artifact bytes still match the cargo output. On the reporter's setup (Polars + debug info on ext4) the previous flow was ~8 s of post-compile copying; the new flow is one fs::rename (O(1) on every mainstream filesystem) when no auditwheel patching occurred, and is a no-op otherwise.

How

stage_artifact (src/build_context/repair.rs) renames the cargo output into target/maturin/ and now records the original path on the BuildArtifact instead of materialising a copy back at the cargo path. Cross-device falls back to reflink_or_copy + remove_file as before.

finalize_staged_artifacts runs after writer.finish() and renames each staged file back to its recorded cargo output path — but only for artifacts whose bytes were not modified in place. The decision is per-artifact: add_external_libs clears cargo_output_path: Option<PathBuf> on any artifact that auditwheel / patchelf / patch_macho / pe_patch touches; finalize iterates and only restores those that still have the marker.

WheelRepairer::patch_editable returns the indices of artifacts it actually modified (a successful patchelf::set_rpath is the threshold; failures don't count, since patchelf writes via temp-file + rename and an error almost always means the original is untouched). The default trait impl is no-op so non-Linux editable builds, and Linux editable builds where linked_paths is empty, get the cargo-output rename for free.

Why

The previous reflink-based copy-back was O(1) on btrfs / xfs-with-reflink / apfs but degraded to a full byte copy on ext4, the most common Linux filesystem. The deferred approach side-steps the copy entirely on the happy path, leaving cargo's output exactly where cargo expects it without any extra I/O.

Tradeoffs

Documented inline on stage_artifact's rustdoc:

  • Cross-device: unpatched + finalize does reflink_or_copy + remove_file here and again in finalize, two full copies vs one. Only matters when the cargo target dir and target/maturin/ are on different filesystems — uncommon.
  • Build error before finalize: if any step between stage_artifact and finalize_staged_artifacts fails, the cargo output stays absent until the next compile. The wheel is the deliverable so this is not a correctness bug, and the staged artifact stays recoverable from target/maturin/.

Tests

  • finalize_renames_atomically_on_same_device
  • finalize_overwrites_existing_destination
  • finalize_errors_when_source_missing
  • stage_file_removes_source
  • stage_file_overwrites_stale_destination
  • stage_then_finalize_roundtrips_contents

Checklist

  • `cargo fmt --all -- --check` ✓
  • `cargo clippy --all-targets -- -D warnings` ✓
  • `cargo test --lib` — 190 existing + 6 new, all passing ✓

cc @messense — extends the reflink fast-path you added in #3054 to ext4 by deferring the copy-back step entirely.

Copy link
Copy Markdown
Member

@messense messense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed write-up! Unfortunately I believe this approach introduces a regression for the bug fixed in #2969.

The problem: shared inodes during repair

stage_artifact renames the cargo output into target/maturin/, then creates a copy-back at the original cargo output path. With a hard link, both paths share the same inode. The auditwheel repair and in-tree patchers then mutate the staged artifact in-place, which means those mutations also corrupt the cargo output path — exactly the bug #2969 fixed.

Specifically:

  • Linux: patchelf edits files in-place (truncate + write), not via temp-file + rename.
  • macOS: patch_macho uses fs_err::write(file, &data) — also an in-place overwrite.
  • Windows: pe_patch::replace_needed uses fs::write(file_path, data) — also in-place.

The table in the PR description claims patchelf uses "temp file + rename over staged", but that is not how patchelf works in the invocations maturin uses. All three platforms mutate via truncate-and-write on the same path, so the original cargo output (now a hard link to the same inode) sees the patched content too.

Interaction with #2969

PR #2969 originally introduced copy_artifact_for_repair for bin artifacts to prevent auditwheel/patchelf from corrupting cargo outputs. That logic was later refactored into the current stage_artifact, which now handles both cdylib and bin artifacts. Replacing the copy-back with a hard link would undermine the isolation for all artifact types, not just bins.

On a subsequent build where cargo skips recompilation, it would find already-patched DT_NEEDED entries / modified Mach-O load commands, causing the failures described in #2680.

What would work instead

The existing reflink_or_copy is the correct approach — it creates a separate inode (via CoW or full copy) so the staged artifact can be safely mutated without affecting the cargo output.

If hard-link performance is desired, it would only be safe to hard-link after all repair/patching steps are complete, but that requires more complex control-flow changes that aren't worth the complexity.

The editable-install hard link (binding_generator/mod.rs) has the same concern: the source is the staged artifact that may be modified by auditwheel repair afterward.

@messense
Copy link
Copy Markdown
Member

messense commented Apr 25, 2026

For the original issue (#3111), the artifact only needs to be staged (renamed away) for isolation during repair — the copy-back to the cargo output path is purely a convenience so users can find the artifact at the expected location. For projects like Polars that have no external shared lib deps, add_external_libs returns early without modifying anything, making the copy-back entirely wasted work.

A simpler and universally fast fix: skip the copy-back in stage_artifact, and rename back only when auditwheel did NOT modify the artifact:

  • If no external libs were bundled (the common case for projects like Polars): rename the staged artifact back to the cargo output path after the wheel is written — O(1) on any filesystem.
  • If auditwheel repair DID modify the artifact: keep the current behavior (reflink/copy back the original, or just skip the copy-back). Renaming back here would put the patched artifact at the cargo output path, which is exactly the bug fix: copy bin artifacts before auditwheel repair to avoid rerun failures #2969 fixed.

This eliminates the expensive copy for the common case on all filesystems, not just ext4, while preserving the isolation invariant that #2969 established.

@pratyush618
Copy link
Copy Markdown
Contributor Author

Thanks @messense — you're right on both counts. The modifier-semantics table I posted was wrong: patchelf, patch_macho, and pe_patch::replace_needed all truncate-and-write in place, so a hard link aliases the cargo output path with the patched bytes and re-introduces #2969 / #2680. Reverted in 621b37b.

Pushed three new commits implementing your suggested approach:

  • 621b37b — clean revert of the three hard-link commits.
  • 6d29cacstage_artifact no longer copies back; the original cargo path is recorded on BuildArtifact::cargo_output_path and add_external_libs now returns Result<bool> so the orchestrator can decide. After writer.finish() succeeds in write_wheel and build_bin_wheel, finalize_staged_artifacts does an O(1) fs::rename(staged → cargo_output) per artifact, but only when was_patched == false. When patching occurred the cargo output path is left empty (cargo will recompile next time anyway, since the patched artifact's DT_NEEDED / load commands / PE imports don't match cargo's view) — strict improvement over today's stale-unpatched-copy at the cargo path.
  • 85b3275 — changelog entry under 1.13.0.

Three new unit tests cover finalize: same-device rename, overwrite-existing-destination, and source-missing error.

Local gates green:

  • cargo fmt --all — clean
  • cargo clippy --tests --all-features -- -D warnings — clean
  • cargo test --lib — 193/193 pass (incl. 3 new finalize tests)
  • cargo test — 84/86 integration; the 2 failures (integration_pyo3_bin, integration_wasm_hello_world) reproduce on main on this machine — local toolchain issues unrelated to this change.

The two CI failures on the previous push were both pre-existing on main and not caused by this PR:

  • Lint: cargo deny advisory RUSTSEC-2026-0104 (rustls-webpki 0.103.12); fix is to bump to ≥0.103.13 — happy to include in this PR or leave it as a separate cleanup, whichever you prefer.
  • Test (windows-11-arm, 3.14t): hosted-toolcache failed to set up Python 3.14.4 freethreaded for arm64 — runner infra.

The branch name fix/stage-artifact-ext4-hardlink is now slightly stale relative to the implementation; let me know if you'd like me to rename or rebase before re-review.

@pratyush618 pratyush618 force-pushed the fix/stage-artifact-ext4-hardlink branch from 85b3275 to e459bbb Compare April 25, 2026 09:15
Copy link
Copy Markdown
Member

@messense messense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick rework — the deferred rename-back approach is the right direction! A few issues:

1. patch_editable returns true even when nothing was modified

The editable path unconditionally returns Ok(true) after calling patch_editable, but:

  • The default patch_editable trait impl is a no-op (non-Linux platforms)
  • On Linux, it skips artifacts with empty linked_paths

So for a macOS/Windows editable build, or a Linux editable build with no linked_paths, the artifact is never modified but was_patched = true prevents the rename-back. This is safe (conservative) but unnecessarily forces cargo to recompile on the next build. Consider having patch_editable return bool indicating whether it actually modified anything, or at minimum rename was_patched to may_have_been_patched so the intent is clear.

2. Error path leaves cargo output missing

If the build errors after stage_artifact but before finalize_staged_artifacts (e.g. auditwheel() fails, writer.finish() fails), the cargo output path stays absent because the original was renamed/removed. The wheel is the deliverable so this isn't a correctness bug, but it's a UX regression from the current behavior where the copy-back happens immediately. Worth documenting, or consider an RAII guard that restores on drop.

3. Cross-device finalize works but is a double copy

In the cross-device case, stage_artifact does reflink_or_copy + remove_file, then finalize_staged_artifact does another reflink_or_copy + remove_file back. That's two full copies instead of the current one. Might be worth a comment acknowledging this is the expected tradeoff (cross-device is rare).

4. Minor: finalize_staged_artifact is pub(crate) but only used in build_orchestrator

The finalize_staged_artifacts wrapper in build_orchestrator.rs is the real entry point. Could keep finalize_staged_artifact private to the repair module and only export the wrapper, but this is cosmetic.

Comment thread Changelog.md Outdated
pratyush618 added a commit to pratyush618/maturin that referenced this pull request Apr 25, 2026
- patch_editable now returns whether it modified the artifact, so the
  cargo-output restore happens for editable builds on macOS/Windows and
  on Linux when linked_paths is empty (was unconditionally skipped).
- document the cross-device double-copy tradeoff and the build-error
  path where the cargo output stays absent until the next compile.
- make finalize_staged_artifact private to the repair module; expose
  only the finalize_staged_artifacts wrapper from build_context.
- drop manual Changelog.md entry — cargo-cliff regenerates it on
  release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pratyush618
Copy link
Copy Markdown
Contributor Author

pratyush618 commented Apr 25, 2026

Thanks for the thorough review @messense! Pushed a5ccde97 addressing each item:

1. patch_editable no-op no longer blocks the restore

Made WheelRepairer::patch_editable return Result<bool> indicating whether it actually modified anything. The default trait impl returns Ok(false); the Linux impl returns Ok(true) once it touches a binary (conservatively even on set_rpath error, since the bytes may have been partially written). add_external_libs propagates that bool, so on macOS/Windows editable builds and on Linux editable builds with empty linked_paths, was_patched is now false and the cargo output path is restored — no unnecessary recompiles.

2. Error path

Documented in the stage_artifact doc comment as a tradeoff: if a step between stage_artifact and finalize_staged_artifacts errors (auditwheel, writer.finish(), etc.), the cargo output path stays absent until the next compile. The wheel is the deliverable so this isn't a correctness bug, and the staged artifact is still recoverable from target/maturin/. I went with documentation rather than an RAII guard since the failure mode is benign and a guard would conflate "should restore on success" with "should restore on error" (where the bytes' state is uncertain).

3. Cross-device double copy

Acknowledged in the same doc-comment tradeoffs section — two reflink_or_copy + remove_file operations instead of one, only when the cargo target dir and maturin staging dir are on different filesystems (uncommon).

4. Visibility

Moved the finalize_staged_artifacts wrapper into build_context::repair; finalize_staged_artifact is now private to that module and only finalize_staged_artifacts is exported via pub(crate) from build_context.

Changelog

Dropped — thanks for the heads-up that cargo-cliff regenerates it on release.

pratyush618 added a commit to pratyush618/maturin that referenced this pull request Apr 25, 2026
- patch_editable now returns whether it modified the artifact, so the
  cargo-output restore happens for editable builds on macOS/Windows and
  on Linux when linked_paths is empty (was unconditionally skipped).
- document the cross-device double-copy tradeoff and the build-error
  path where the cargo output stays absent until the next compile.
- make finalize_staged_artifact private to the repair module; expose
  only the finalize_staged_artifacts wrapper from build_context.
- drop manual Changelog.md entry — cargo-cliff regenerates it on
  release.
@pratyush618 pratyush618 force-pushed the fix/stage-artifact-ext4-hardlink branch from c130029 to a5ccde9 Compare April 25, 2026 09:48
@messense messense requested a review from Copilot April 25, 2026 10:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts maturin’s artifact staging/restoration flow to reduce expensive post-compile file operations by deferring restoration of the cargo output artifact until after the wheel is successfully written, and by tracking whether auditwheel/editable patching occurred.

Changes:

  • Added cargo_output_path to BuildArtifact so the original cargo output location can be restored later.
  • Changed BuildContext::add_external_libs / WheelRepairer::patch_editable to return whether any in-place patching occurred, and wired that through the build orchestrator.
  • Added finalize_staged_artifacts / finalize_staged_artifact to restore staged artifacts back to the cargo output path when safe, plus unit tests for finalize.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/compile.rs Extends BuildArtifact with cargo_output_path to support deferred restoration.
src/build_orchestrator.rs Captures a was_patched flag and calls finalize_staged_artifacts after writer.finish().
src/build_context/repair.rs Changes add_external_libs to return patch status; updates stage_artifact to record original path and adds finalize helpers + tests.
src/build_context/mod.rs Re-exports finalize_staged_artifacts for orchestrator use.
src/auditwheel/repair.rs Updates patch_editable API to return bool indicating in-place modifications.
src/auditwheel/linux.rs Implements the new patch_editable -> Result<bool> behavior and tracks whether patching was attempted.

Comment thread src/build_context/repair.rs
Comment thread src/build_orchestrator.rs
Comment thread src/build_context/repair.rs Outdated
Comment thread src/auditwheel/linux.rs
Comment thread src/auditwheel/linux.rs
@pratyush618 pratyush618 changed the title perf: hard-link instead of copy for stage_artifact and editable install perf: defer stage_artifact copy-back, finalize via rename when unpatched Apr 25, 2026
pratyush618 and others added 11 commits April 26, 2026 10:26
Avoids a full copy on ext4 and other filesystems without reflink support.
Adds link_or_copy helper (hard_link -> reflink -> copy) next to the
existing reflink_or_copy and uses it for the stage_artifact copy-back.

Refs PyO3#3111
Replaces the full fs::copy used to install the compiled extension into
the Python source tree with link_or_copy. The existing remove_file call
right above it guarantees the destination is absent so the hard link
path always engages.

Refs PyO3#3111
Reverts 6c862c4, 4069caf, and e9f7c48.

Hard-linking aliases inodes between the staged artifact and the cargo
output path. patchelf, patch_macho, and pe_patch all mutate the staged
file in place (truncate-and-write), so the cargo output path sees the
patched bytes too. This re-introduces the bug fixed by PyO3#2969 / PyO3#2680.

Reverting cleanly so the follow-up commit can implement the alternative
proposed by @messense: defer the copy-back and only restore the cargo
output path via rename when no patching was needed.
Removes the unconditional copy-back from stage_artifact and replaces
it with a post-wheel-write rename when no auditwheel patching has
occurred. The cargo output path is now restored via fs::rename — O(1)
on every mainstream filesystem (ext4, xfs, btrfs, zfs, ntfs, refs,
apfs, hfs+) — instead of a full byte copy, eliminating the ~8s
post-compile cost reported on multi-GiB Polars-class artifacts.

In the patching branch (rare — only projects that bundle external
shared libs, e.g. lib_with_disallowed_lib), the cargo output path is
left empty rather than copied back. Cargo recompiles on the next
invocation, which it had to do anyway because the patched artifact's
DT_NEEDED / Mach-O load commands / PE imports no longer reflect
cargo's view. Strict improvement over today's stale-unpatched-copy
state, which can confuse cargo's incremental cache.

add_external_libs now returns Result<bool> so the orchestrator can
decide whether to finalize. BuildArtifact carries cargo_output_path
through compile_cdylib / build_bin_wheel / AuditedArtifact so the
finalize step knows where to restore each artifact.

Closes PyO3#3111.
- patch_editable now returns whether it modified the artifact, so the
  cargo-output restore happens for editable builds on macOS/Windows and
  on Linux when linked_paths is empty (was unconditionally skipped).
- document the cross-device double-copy tradeoff and the build-error
  path where the cargo output stays absent until the next compile.
- make finalize_staged_artifact private to the repair module; expose
  only the finalize_staged_artifacts wrapper from build_context.
- drop manual Changelog.md entry — cargo-cliff regenerates it on
  release.
patch_editable returns the indices it actually modified (set_rpath Ok
only). add_external_libs clears cargo_output_path on patched artifacts.
finalize_staged_artifacts iterates and restores any that still have it,
so a mixed editable build no longer skips restore for unpatched
artifacts.
Extract stage_file from BuildContext::stage_artifact so the
move-to-staging primitive is testable without a BuildContext, and add
three tests covering the move, the Windows stale-dest path, and the
stage→finalize roundtrip.
@messense messense force-pushed the fix/stage-artifact-ext4-hardlink branch from 8e1cb6e to 73b84b4 Compare April 26, 2026 02:27
@messense messense merged commit 6bef854 into PyO3:main Apr 26, 2026
42 of 44 checks passed
messense added a commit that referenced this pull request Apr 26, 2026
…#3159)

`fs::rename` in `finalize_staged_artifacts` to skip the full byte copy
on ext4. The trade-off was that whenever auditwheel / delocate /
delvewheel actually rewrote bytes (DT_NEEDED, Mach-O load commands, PE
imports, RPATH), the cargo output path was left empty (same-fs) or
holding the unpatched original (cross-device), and the staged copy was
never restored — see #2969 for why patched bytes must not flow back to
cargo's incremental cache. Tools and CI scripts that read
`target/<target>/release/<bin>` after a patched build saw a missing or
stale file.

Restore the invariant that the cargo output path always holds the clean
unpatched bytes after a successful build, while keeping the rename-back
fast-path for the unpatched case.

The repair pipeline now splits into a detect step and a patch step:

- New `PatchKind<'a>` enum carries the per-call context (`Repair {
grafted, libs_dir, artifact_dir }` or `Editable`) and is used by both
prediction and execution.
- New `WheelRepairer::patch_required(audited, kind) -> Vec<bool>`
predicts, before any bytes are rewritten, which audited artifacts the
upcoming `patch` call will modify in place. Conservative defaults:
all-true on `Repair`, all-false on `Editable`. `ElfRepairer` overrides
the editable arm to track `linked_paths.is_empty()` per-artifact so
multi-bin editable builds where only some bins carry cargo-target search
paths still get the cheap rename-back for the rest.
- `WheelRepairer::patch_editable` is folded into `patch` via
`PatchKind::Editable`, removing the parallel API surface.
- New `copy_back_cargo_outputs` helper in `build_context/repair.rs`
reflinks/copies the still-clean staged artifact back to its cargo output
path for every `will_patch[i] == true` entry, then `take()`s
`cargo_output_path` so finalize won't overwrite the unpatched bytes with
the patched ones. Cross-device staging is a no-op (the original is
already at the cargo path); same-filesystem staging pays one reflink per
genuinely-patched artifact (O(1) on apfs/btrfs/xfs/refs, a real copy on
ext4) — same cost as the pre-#3155 worst case, but scoped only to
artifacts that actually need patching.
- `add_external_libs` calls `patch_required` then
`copy_back_cargo_outputs` then `patch` on both the editable and
non-editable branches. The unconditional `cargo_output_path = None`
clearing loop is gone — clearing is now per-artifact and happens in the
helper.

Net effect:

- Unpatched artifacts: identical to today — single rename-back in
`finalize_staged_artifacts`.
- Patched artifacts (same-fs): one reflink/copy at patch time instead of
the rename-back; cargo output ends up with unpatched bytes.
- Patched artifacts (cross-device): zero extra I/O — `stage_file`
already left the unpatched original at cargo output.

Public auditwheel types (`AuditedArtifact`, `GraftedLib`, `AuditResult`,
`PatchKind`) now derive `Debug` for consistency with the rest of the
public surface.

The follow-up commit tightens each platform's `patch` impl to skip
artifacts whose `external_libs` is empty (ELF: skip `set_rpath`; Mach-O:
filter `install_name_changes` per-artifact and skip `patch_macho` +
`ad_hoc_sign` when the filter is empty; PE: filter `replacements`
per-artifact case-insensitively against `aa.external_libs[*].name` and
skip `pe_patch::replace_needed` when empty), and loosens the default
`patch_required` for `Repair` to `!aa.external_libs.is_empty()`. The two
changes ship together — tightening only the default would regress #2969,
while tightening only the impls would just be dead-code elimination with
`copy_back_cargo_outputs` still paying an unnecessary reflink for the
now-unaffected artifacts.

### Known limitation: dlopen-only dependencies

`external_libs` is populated by static dependency analysis (lddtree on
the artifact's LC_LOAD_DYLIB / DT_NEEDED / PE import table). A library
reached only via runtime loading — `dlopen("libfoo.so", ...)`, `dlsym`,
`LoadLibrary` — won't appear in `external_libs`, so its host artifact
gets `will_patch = false` and is skipped by the tightened `patch` impls.

For Mach-O install-name rewriting and PE import-table rewriting this is
benign: a `dlopen` doesn't go through the load command / import table,
so rewriting them was a no-op anyway. ELF DT_NEEDED rewriting was
already gated on `aa.external_libs` pre-PR for the same reason.

The new behavior in this PR widens the gating to **ELF RPATH append**:
pre-PR, every audited artifact got `$ORIGIN/<dist>.libs` appended
unconditionally; post-PR, only those with non-empty `external_libs` do.
Practically: a Linux extension that statically depends on nothing but
`dlopen`s a library bundled by some other artifact in the same wheel
will now lack the RPATH and fail at runtime. Workaround: declare a
static dependency on at least one bundled lib (so the artifact gets the
auto-RPATH), or set the RPATH manually via
`cargo:rustc-link-arg=-Wl,-rpath,$ORIGIN/<dist>.libs`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slow post-compilation file copy steps

3 participants