Skip to content

auditwheel: copy unpatched cargo output back before in-place patching#3159

Merged
messense merged 4 commits intoPyO3:mainfrom
messense:auditwheel-copy-back-on-patch
Apr 26, 2026
Merged

auditwheel: copy unpatched cargo output back before in-place patching#3159
messense merged 4 commits intoPyO3:mainfrom
messense:auditwheel-copy-back-on-patch

Conversation

@messense
Copy link
Copy Markdown
Member

@messense messense commented Apr 26, 2026

fs::rename in finalize_staged_artifacts to skip the full byte copy on ext4. The trade-off was that whenever auditwheel / delocate / delvewheel actually rewrote bytes (DT_NEEDED, Mach-O load commands, PE imports, RPATH), the cargo output path was left empty (same-fs) or holding the unpatched original (cross-device), and the staged copy was never restored — see #2969 for why patched bytes must not flow back to cargo's incremental cache. Tools and CI scripts that read target/<target>/release/<bin> after a patched build saw a missing or stale file.

Restore the invariant that the cargo output path always holds the clean unpatched bytes after a successful build, while keeping the rename-back fast-path for the unpatched case.

The repair pipeline now splits into a detect step and a patch step:

  • New PatchKind<'a> enum carries the per-call context (Repair { grafted, libs_dir, artifact_dir } or Editable) and is used by both prediction and execution.
  • New WheelRepairer::patch_required(audited, kind) -> Vec<bool> predicts, before any bytes are rewritten, which audited artifacts the upcoming patch call will modify in place. Conservative defaults: all-true on Repair, all-false on Editable. ElfRepairer overrides the editable arm to track linked_paths.is_empty() per-artifact so multi-bin editable builds where only some bins carry cargo-target search paths still get the cheap rename-back for the rest.
  • WheelRepairer::patch_editable is folded into patch via PatchKind::Editable, removing the parallel API surface.
  • New copy_back_cargo_outputs helper in build_context/repair.rs reflinks/copies the still-clean staged artifact back to its cargo output path for every will_patch[i] == true entry, then take()s cargo_output_path so finalize won't overwrite the unpatched bytes with the patched ones. Cross-device staging is a no-op (the original is already at the cargo path); same-filesystem staging pays one reflink per genuinely-patched artifact (O(1) on apfs/btrfs/xfs/refs, a real copy on ext4) — same cost as the pre-perf: defer stage_artifact copy-back, finalize via rename when unpatched #3155 worst case, but scoped only to artifacts that actually need patching.
  • add_external_libs calls patch_required then copy_back_cargo_outputs then patch on both the editable and non-editable branches. The unconditional cargo_output_path = None clearing loop is gone — clearing is now per-artifact and happens in the helper.

Net effect:

  • Unpatched artifacts: identical to today — single rename-back in finalize_staged_artifacts.
  • Patched artifacts (same-fs): one reflink/copy at patch time instead of the rename-back; cargo output ends up with unpatched bytes.
  • Patched artifacts (cross-device): zero extra I/O — stage_file already left the unpatched original at cargo output.

Public auditwheel types (AuditedArtifact, GraftedLib, AuditResult, PatchKind) now derive Debug for consistency with the rest of the public surface.

The follow-up commit tightens each platform's patch impl to skip artifacts whose external_libs is empty (ELF: skip set_rpath; Mach-O: filter install_name_changes per-artifact and skip patch_macho + ad_hoc_sign when the filter is empty; PE: filter replacements per-artifact case-insensitively against aa.external_libs[*].name and skip pe_patch::replace_needed when empty), and loosens the default patch_required for Repair to !aa.external_libs.is_empty(). The two changes ship together — tightening only the default would regress #2969, while tightening only the impls would just be dead-code elimination with copy_back_cargo_outputs still paying an unnecessary reflink for the now-unaffected artifacts.

Known limitation: dlopen-only dependencies

external_libs is populated by static dependency analysis (lddtree on the artifact's LC_LOAD_DYLIB / DT_NEEDED / PE import table). A library reached only via runtime loading — dlopen("libfoo.so", ...), dlsym, LoadLibrary — won't appear in external_libs, so its host artifact gets will_patch = false and is skipped by the tightened patch impls.

For Mach-O install-name rewriting and PE import-table rewriting this is benign: a dlopen doesn't go through the load command / import table, so rewriting them was a no-op anyway. ELF DT_NEEDED rewriting was already gated on aa.external_libs pre-PR for the same reason.

The new behavior in this PR widens the gating to ELF RPATH append: pre-PR, every audited artifact got $ORIGIN/<dist>.libs appended unconditionally; post-PR, only those with non-empty external_libs do. Practically: a Linux extension that statically depends on nothing but dlopens a library bundled by some other artifact in the same wheel will now lack the RPATH and fail at runtime. Workaround: declare a static dependency on at least one bundled lib (so the artifact gets the auto-RPATH), or set the RPATH manually via cargo:rustc-link-arg=-Wl,-rpath,$ORIGIN/<dist>.libs.

messense and others added 4 commits April 26, 2026 18:19
`fs::rename` in `finalize_staged_artifacts` to skip the full byte copy
on ext4. The trade-off was that whenever auditwheel / delocate /
delvewheel actually rewrote bytes (DT_NEEDED, Mach-O load commands, PE
imports, RPATH), the cargo output path was left empty (same-fs) or
holding the unpatched original (cross-device), and the staged copy was
never restored — see PyO3#2969 for why patched bytes must not flow back to
cargo's incremental cache. Tools and CI scripts that read
`target/<target>/release/<bin>` after a patched build saw a missing or
stale file.

Restore the invariant that the cargo output path always holds the clean
unpatched bytes after a successful build, while keeping the rename-back
fast-path for the unpatched case.

The repair pipeline now splits into a detect step and a patch step:

- New `PatchKind<'a>` enum carries the per-call context (`Repair {
  grafted, libs_dir, artifact_dir }` or `Editable`) and is used by both
  prediction and execution.
- New `WheelRepairer::patch_required(audited, kind) -> Vec<bool>`
  predicts, before any bytes are rewritten, which audited artifacts the
  upcoming `patch` call will modify in place. Conservative defaults:
  all-true on `Repair`, all-false on `Editable`. `ElfRepairer`
  overrides the editable arm to track `linked_paths.is_empty()`
  per-artifact so multi-bin editable builds where only some bins carry
  cargo-target search paths still get the cheap rename-back for the
  rest.
- `WheelRepairer::patch_editable` is folded into `patch` via
  `PatchKind::Editable`, removing the parallel API surface.
- New `copy_back_cargo_outputs` helper in `build_context/repair.rs`
  reflinks/copies the still-clean staged artifact back to its cargo
  output path for every `will_patch[i] == true` entry, then `take()`s
  `cargo_output_path` so finalize won't overwrite the unpatched bytes
  with the patched ones. Cross-device staging is a no-op (the original
  is already at the cargo path); same-filesystem staging pays one
  reflink per genuinely-patched artifact (O(1) on apfs/btrfs/xfs/refs,
  a real copy on ext4) — same cost as the pre-PyO3#3155 worst case, but
  scoped only to artifacts that actually need patching.
- `add_external_libs` calls `patch_required` then
  `copy_back_cargo_outputs` then `patch` on both the editable and
  non-editable branches. The unconditional
  `cargo_output_path = None` clearing loop is gone — clearing is now
  per-artifact and happens in the helper.

Net effect:

- Unpatched artifacts: identical to today — single rename-back in
  `finalize_staged_artifacts`.
- Patched artifacts (same-fs): one reflink/copy at patch time instead
  of the rename-back; cargo output ends up with unpatched bytes.
- Patched artifacts (cross-device): zero extra I/O — `stage_file`
  already left the unpatched original at cargo output.

Public auditwheel types (`AuditedArtifact`, `GraftedLib`, `AuditResult`,
`PatchKind`) now derive `Debug` for consistency with the rest of the
public surface.
Tightens the per-platform `patch` impls so artifacts whose `external_libs`
is empty are left alone, and loosens the default
`WheelRepairer::patch_required` to `!aa.external_libs.is_empty()` for
`Repair`. Together this saves one reflink in `copy_back_cargo_outputs`
plus one `patchelf::set_rpath` / `patch_macho` + `ad_hoc_sign` /
`pe_patch::replace_needed` invocation per "audited but actually
unaffected" artifact in mixed multi-bin wheels.

Before this change every artifact in the audited slice was rewritten:
- ELF: the final RPATH-set loop ran unconditionally on every artifact.
- Mach-O: `install_name_changes` was built from the full grafted name
  map for every artifact, then `patch_macho` + `ad_hoc_sign` were
  invoked even when the artifact's load commands didn't reference any
  grafted install name.
- PE: `pe_patch::replace_needed` was invoked with the full grafted
  replacement list for every artifact, regardless of whether its
  import table actually referenced any of those names.

The new behaviour, scoped per-artifact:
- ELF: skip the RPATH-set when `external_libs.is_empty()`. DT_NEEDED
  rewriting was already gated on per-artifact deps.
- Mach-O: filter `install_name_changes` to entries the artifact
  actually imports (mirroring the existing ELF DT_NEEDED logic), and
  only call `patch_macho` + `ad_hoc_sign` when the filtered list is
  non-empty.
- PE: filter `replacements` per-artifact case-insensitively against
  `aa.external_libs[*].name` and skip `pe_patch::replace_needed` when
  empty.

The new default `patch_required` for `Repair` relies on the invariant
that `prepare_grafted_libs` builds `grafted` from
`audited.iter().flat_map(|a| &a.external_libs)`, so an artifact with
empty `external_libs` is guaranteed to share no install names with the
grafted set and is therefore left untouched by every current `patch`
implementation. `ElfRepairer::patch_required` keeps the override
(duplicating the default Repair arm because Rust traits can't override
a single match arm) only because its `Editable` arm still needs the
`linked_paths`-based prediction.

The two changes ship together: tightening only the default would
regress PyO3#2969 (finalize would rename patched bytes back to the cargo
output path); tightening only the impls without the prediction would
just be dead-code elimination with `copy_back_cargo_outputs` still
paying an unnecessary reflink for the now-unaffected artifacts.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…r-artifact filter

The new per-artifact `install_name_changes` filter in `MacOSRepairer::patch`
relies on `lddtree`'s `Library::name` being byte-exact equal to the
install-name string in the binary's `LC_LOAD_DYLIB` (no leaf-vs-`@rpath/`
normalization). That contract is currently held by `lddtree-0.5.x` (see
`find_macho_library` -> `try_single_candidate`, which sets
`name: lib_name.to_string()` from the unmodified load-command bytes), but
nothing in this crate enforces it. Document the dependency so a future
`lddtree` bump that normalizes the form trips on the comment instead of
silently making the filter narrower than `patch_macho`'s
`change_install_name` lookup.

No behavior change.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…putState` enum

`stage_file`, `copy_back_cargo_outputs`, and `finalize_staged_artifacts`
were communicating through an implicit invariant: cross-device staging
leaves the original at `cargo_output`, same-fs staging leaves it empty,
and post-patch artifacts have `cargo_output_path` cleared via `take()`.
`copy_back_cargo_outputs` and `finalize_staged_artifacts` then re-derived
the staging mode at runtime by stat-ing `cargo_output.exists()`.

That worked but was fragile: any future refactor of `stage_file` that
broke the "cross-device leaves the original" contract would silently
break the copy-back path (cargo path ends up holding stale or zero bytes
because finalize no-ops on a `take()`'d `cargo_output_path` while
`copy_back_cargo_outputs` skipped the reflink based on a wrong
`exists()` reading). Spotted in code review of PyO3#3159.

Replace `Option<PathBuf>` with a four-variant `CargoOutputState` enum
that captures the staging mode explicitly:

```rust
pub enum CargoOutputState {
    NotStaged,
    Renamed { cargo_output: PathBuf },
    Mirrored { cargo_output: PathBuf },
    Patched,
}
```

- `stage_file` now returns `(PathBuf, CargoOutputState)` and emits
  `Renamed` on the `fs::rename` path / `Mirrored` on the cross-device
  reflink-or-copy path, eliminating the `exists()`-based inference.
- `copy_back_cargo_outputs` matches on the state and transitions
  `Renamed` -> `Patched` (with the reflink) or `Mirrored` -> `Patched`
  (no I/O); `NotStaged` / `Patched` are no-ops. The runtime
  `cargo_output.exists()` check is gone.
- `finalize_staged_artifacts` dispatches on the same state: `Renamed`
  goes through `restore_renamed` (rename, fallback to reflink+remove
  if `target/` straddled a mount point added after staging),
  `Mirrored` drops the staged duplicate, `Patched` / `NotStaged`
  no-op. The `exists()` early-return is gone.

Future refactors of `stage_file` must now update the enum variants
explicitly, making the contract impossible to violate silently. The
behavior at every call site is identical to the previous logic for
every reachable state today.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@messense messense merged commit 6305dfc into PyO3:main Apr 26, 2026
78 of 83 checks passed
@messense messense deleted the auditwheel-copy-back-on-patch branch April 26, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant