python-wheel-build · pavank63 · Apr 21, 2026 · rd4398 · Apr 22, 2026 · rd4398
@@ -0,0 +1,300 @@
+# Build isolation for sandboxing build backends
+
+- Author: Pavan Kalyan Reddy Cherupally
+- Created: 2026-04-21
+- Status: Open
+- Issue: [#1019](https://github.com/python-wheel-build/fromager/issues/1019)
+
+## What
+
+A `--build-isolation` flag that sandboxes PEP 517 build backend
+subprocesses (`build_sdist`, `build_wheel`) so they cannot read
+credentials, access the network, or interfere with the host system.
+
+## Why
+
+Fromager executes upstream-controlled code (setup.py, build backends)
+during wheel builds. A compromised or malicious package can:
+
+- Read credential files like `$HOME/.netrc` and exfiltrate tokens
+- Access sensitive environment variables (registry keys, API tokens)
+- Reach the network to upload stolen data or download payloads
+- Signal or inspect other processes via `/proc` or shared IPC
+- Interfere with parallel builds through shared `/tmp`
+- Leave persistent backdoors: `.pth` files that run on every Python
+  startup, shell profile entries that run on every login, or
+  background daemons that survive the build
+
+The existing `--network-isolation` flag blocks network access but does
+not protect against credential theft, process/IPC visibility, or
+persistent backdoors.
+
+Build isolation wraps each build backend invocation in a sandbox that
+combines file-level credential protection with OS-level namespace
+isolation. Only the PEP 517 hook calls are sandboxed; download,
+installation, and upload steps run normally.
+
+## Goals
+
+- A `--build-isolation/--no-build-isolation` CLI flag (default off)
+  that supersedes `--network-isolation` for build steps
+- Credential protection: build processes cannot read `.netrc` or
+  other root-owned credential files
+- Network isolation: no routing in the build namespace
+- Process isolation: build cannot see or signal other processes
+- IPC isolation: separate shared memory, semaphores, message queues
+- Persistence protection: build cannot drop `.pth` backdoors, modify
+  shell profiles, or leave background daemons running after the build
+- Environment scrubbing: downstream build systems can strip sensitive
+  environment variables via `FROMAGER_SCRUB_ENV_VARS`
+- Works in unprivileged containers (Podman/Docker) without
+  `--privileged` or `--cap-add SYS_ADMIN`
+- Minimal overhead (< 50ms per build invocation)
+
+## Non-goals
+
+- **Mount namespace isolation.** Mounting tmpfs over `$HOME` or
+  making `/usr` read-only was explored but abandoned. The
+  `pyproject_hooks` library creates temporary files in `/tmp` for
+  IPC between the parent process and the build backend
+  (`input.json`/`output.json`). A mount namespace with a fresh
+  `/tmp` hides these files and breaks the build. Bind-mounting the
+  specific IPC directory is fragile and couples fromager to
+  `pyproject_hooks` internals.
+- **bubblewrap (bwrap).** bwrap provides stronger filesystem
+  isolation but requires `CAP_SYS_ADMIN` or a privileged container,
+  which is unavailable in the standard unprivileged Podman/Docker
+  build environment.
+- **Hardcoded list of sensitive environment variables.** Fromager is
+  an upstream tool; the specific variables that are sensitive depend
+  on the downstream build system. Scrubbing is controlled entirely
+  by the deployer via `FROMAGER_SCRUB_ENV_VARS`.
+- **macOS / Windows support.** Linux namespaces and `unshare` are
+  Linux-only. The flag is unavailable on other platforms.
+
+## How
+
+### Isolation mechanism
+
+Build isolation combines two complementary techniques:
+
+#### 1. Ephemeral Unix user
+
+Before each build invocation, the isolation script creates a
+short-lived system user with `useradd` and removes it with `userdel`
+on exit (via `trap EXIT`). The user has:
+
+- No home directory (`-M -d /nonexistent`)
+- No login shell (`-s /sbin/nologin`)
+- A randomized name (`fmr_<random>`) to avoid collisions
+
+This provides file-level credential protection: `.netrc` is owned by
+`root:root` with mode `600`, so the ephemeral user cannot read it.
+The overhead is approximately 10ms for `useradd` and 10ms for
+`userdel`.
+
+#### 2. Linux namespaces via unshare
+
+After dropping to the ephemeral user with `setpriv`, the script
+enters new namespaces with `unshare`:
+
+| Namespace | Flag | Purpose |
+| -- | -- | -- |
+| Network | `--net` | No routing; blocks all network access |
+| PID | `--pid --fork` | Build sees only its own processes |
+| IPC | `--ipc` | Isolated shared memory and semaphores |
+| UTS | `--uts` | Separate hostname |
+
+`--map-root-user` maps the ephemeral user to UID 0 inside the
+namespace, giving it enough privilege to bring up the loopback
+interface and set the hostname without requiring real root.
+
+#### Why setpriv instead of runuser
+
+`runuser` calls `setgroups()`, which is denied inside user namespaces
+(the kernel blocks it to prevent group membership escalation).
+`setpriv --reuid --regid --clear-groups` avoids this call entirely.
+
+#### Order of operations
+
+```
+useradd fmr_<random>          # create ephemeral user (outside namespace)
+  └─ setpriv --reuid --regid  # drop to ephemeral user
+       └─ unshare --uts --net --pid --ipc --fork --map-root-user
+            ├─ ip link set lo up
+            ├─ hostname localhost
+            └─ exec <build command>
+userdel fmr_<random>          # cleanup (trap EXIT)
+```
+
+The user is created before entering the namespace because `useradd`
+needs access to `/etc/passwd` and `/etc/shadow` on the real
+filesystem. `setpriv` drops privileges before `unshare` so the UID
+switch happens outside the namespace where the real UID is mapped.
+
+### Environment variable scrubbing
+
+Downstream build systems may have sensitive environment variables
+(registry tokens, CI credentials) that should not be visible to
+build backends. Rather than hardcoding a list in fromager, scrubbing
+is controlled by the deployer:
+
+```bash
+# In the container image or CI environment
+export FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD,CI_JOB_TOKEN"
+```
+
+When `--build-isolation` is active, `external_commands.run()` reads
+this comma-separated list and removes the named variables from the
+subprocess environment before invoking the build.
+
+### Integration points
+
+#### CLI (`__main__.py`)
+
+- Build isolation availability is detected at import time (same
+  pattern as network isolation)
+- `--build-isolation/--no-build-isolation` option on the `main`
+  group, stored on `WorkContext`
+- Fails early with a clear message if the platform does not support
+  build isolation
+
+#### WorkContext (`context.py`)
+
+- New `build_isolation: bool` field (default `False`)
+
+#### BuildEnvironment (`build_environment.py`)
+
+- `run()` method accepts `build_isolation` parameter, defaults to
+  `ctx.build_isolation`
+- `install()` method explicitly passes `build_isolation=False`
+  because dependency installation needs access to the local PyPI
+  mirror
+
+#### Build backend hooks (`dependencies.py`)
+
+- `_run_hook_with_extra_environ` passes `ctx.build_isolation` to
+  `build_env.run()`
+
+#### Subprocess runner (`external_commands.py`)
+
+- `run()` accepts `build_isolation: bool` parameter
+- When active, prepends the isolation script to the command,
+  sets `FROMAGER_BUILD_DIR` so the script can `chmod` the build
+  directory for the ephemeral user, applies env scrubbing, and sets
+  `CARGO_NET_OFFLINE=true`
+- Build isolation supersedes network isolation but reuses the
+  `NetworkIsolationError` detection for consistent error reporting
+
+### What is and is not isolated
+
+| Aspect | Protected | Notes |
+| -- | -- | -- |
+| `.netrc` / credentials | Yes | Ephemeral user cannot read root:root 600 files |
+| Network access | Yes | No routing in network namespace |
+| Process visibility | Yes | PID namespace; only build processes visible |
+| IPC (shm, semaphores) | Yes | IPC namespace |
+| Env var leakage | Configurable | Via `FROMAGER_SCRUB_ENV_VARS` |
+| `.pth` / shell profile backdoors | Yes | Ephemeral user cannot write to site-packages or home directory |
+| Persistent background process | Yes | PID namespace kills all processes when the build exits |
+| `/tmp` cross-build leakage | Partial | Sticky bit prevents cross-user access; no mount namespace |
+| Filesystem write access | No | Ephemeral user has world-writable access to build dir |
+| Trojan in build output | No | Malicious code in the built wheel is not detected |
+
+### Compatibility
+
+Works in unprivileged Podman and Docker containers without
+`--privileged` or `--cap-add SYS_ADMIN`. Docker's default seccomp
+profile may block `unshare`; Podman's policy allows it. On Ubuntu
+24.04, `sysctl kernel.apparmor_restrict_unprivileged_userns=0` is
+required.
+
+## Examples
+
+```bash
+# Build with full isolation
+fromager --build-isolation bootstrap -r requirements.txt
+
+# Build with isolation and env scrubbing
+FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD" \
+  fromager --build-isolation bootstrap -r requirements.txt
+```
+
+## Findings
+
+A proof-of-concept package
+([build-attack-test](https://github.com/pavank63/build-attack-test))
+was used to validate the attack surface. It runs security probes from
+`setup.py` during `build_sdist` / `build_wheel` to test what a
+malicious build backend can access. Testing was performed with
+`--network-isolation` enabled.
+
+### Results without build isolation
+
+| Attack vector | Result | Risk |
+| -- | -- | -- |
+| Credential file access (`.netrc`) | **Vulnerable** | Build process can read credential files containing auth tokens |
+| Sensitive environment variables | **Vulnerable** | Build system variables (registry paths, tokens) visible to backends |
+| Network access | Blocked | Already mitigated by `--network-isolation` |
+| Process visibility (PID) | **Vulnerable** | Build can see all running processes including fromager, parallel builds, and their command-line arguments |
+| IPC (shared memory, semaphores) | **Vulnerable** | Build can see and potentially attach to shared memory segments from other processes |
+| Hostname | **Vulnerable** | Real hostname visible, leaks build infrastructure identity |
+| Build cache read/write | **Vulnerable** | Build can read and write to shared compiler caches like ccache and cargo, enabling cache poisoning |
+| Package settings files | **Vulnerable** | Build can read all package override configuration files |
+| Persistent background process | **Vulnerable** | Build can spawn a daemon that continues running after the build finishes |
+| Python `.pth` backdoor | **Vulnerable** | Build can drop a `.pth` file into site-packages that runs code on every Python startup |
+| Shell profile injection | **Vulnerable** | Build can append to `.bashrc` / `.profile` to run code on every shell login |
+| pip config poisoning | **Vulnerable** | Build can write `pip.conf` to redirect dependency installs to an attacker-controlled index |
+
+### Key takeaways
+
+1. **Network isolation alone is insufficient.** A build can steal
+   credentials from `.netrc` and embed them in the built wheel. The
+   credentials leave the build system when the wheel is distributed,
+   bypassing network controls entirely.
+
+2. **Builds can leave persistent backdoors.** `.pth` files, shell
+   profile entries, pip config changes, and background daemons all
+   survive the build and can compromise subsequent builds or the
+   host.
+
+3. **Build cache poisoning is possible.** A poisoned compiler cache
+   entry (ccache, cargo) can inject malicious code into future
+   builds of unrelated packages.
+
+### Supply-chain amplification
+
+The persistence attacks above are especially dangerous because
+fromager builds many packages sequentially in the same environment.
+A single malicious package built early in the bootstrap can
+compromise every package built after it:
+
+- A `.pth` file dropped into site-packages runs on every subsequent
+  Python invocation, including fromager building the next package.
+  It can silently modify source files or inject code into build
+  outputs.
+- A poisoned `pip.conf` redirects dependency installs for all
+  subsequent builds to an attacker-controlled index.
+- A poisoned compiler cache entry (ccache/cargo) injects malicious
+  code into any later package that compiles the same source file.
+- A background daemon can watch the build directory and modify
+  source code for the next package before its build starts.
+
+The published wheels for those downstream packages would contain
+the injected code even though their source is clean.
+
+Build isolation breaks this chain. Each build runs as a separate
+ephemeral user in its own PID, IPC, and network namespace, so it
+cannot write to site-packages, modify pip config, poison caches,
+or leave daemons behind. When fromager runs parallel builds, each
+gets its own ephemeral user (`fmr_<random>`) and its own set of
+namespaces — parallel builds cannot see or interfere with each
+other.
+
+### Remaining gaps
+
+Build cache poisoning and package settings access are **not fully
+addressed** by this proposal, as the ephemeral user still needs
+write access to the build directory. Addressing these would require
+mount namespace isolation, which is incompatible with the current
+`pyproject_hooks` IPC mechanism (see Non-goals).
@@ -4,6 +4,7 @@ Fromager Enhancement Proposals
 .. toctree::
    :maxdepth: 1
 
+   build-isolation
    new-patcher-config
    new-resolver-config
    release-cooldown