From 5fb0f31c5954a539d59f3a16fefc1c2e61ff7a2d Mon Sep 17 00:00:00 2001 From: Pavan Kalyan Reddy Cherupally Date: Tue, 21 Apr 2026 14:38:47 -0500 Subject: [PATCH] docs(proposal): add build isolation design for sandboxed builds Add design proposal for --build-isolation flag that sandboxes PEP 517 build backend subprocesses using ephemeral Unix users and Linux namespaces. Includes security findings from proof-of-concept testing with build-attack-test package. Signed-off-by: Pavan Kalyan Reddy Cherupally Co-Authored-By: Claude --- docs/proposals/build-isolation.md | 300 ++++++++++++++++++++++++++++++ docs/proposals/index.rst | 1 + 2 files changed, 301 insertions(+) create mode 100644 docs/proposals/build-isolation.md diff --git a/docs/proposals/build-isolation.md b/docs/proposals/build-isolation.md new file mode 100644 index 00000000..12f6dcbb --- /dev/null +++ b/docs/proposals/build-isolation.md @@ -0,0 +1,300 @@ +# Build isolation for sandboxing build backends + +- Author: Pavan Kalyan Reddy Cherupally +- Created: 2026-04-21 +- Status: Open +- Issue: [#1019](https://github.com/python-wheel-build/fromager/issues/1019) + +## What + +A `--build-isolation` flag that sandboxes PEP 517 build backend +subprocesses (`build_sdist`, `build_wheel`) so they cannot read +credentials, access the network, or interfere with the host system. + +## Why + +Fromager executes upstream-controlled code (setup.py, build backends) +during wheel builds. A compromised or malicious package can: + +- Read credential files like `$HOME/.netrc` and exfiltrate tokens +- Access sensitive environment variables (registry keys, API tokens) +- Reach the network to upload stolen data or download payloads +- Signal or inspect other processes via `/proc` or shared IPC +- Interfere with parallel builds through shared `/tmp` +- Leave persistent backdoors: `.pth` files that run on every Python + startup, shell profile entries that run on every login, or + background daemons that survive the build + +The existing `--network-isolation` flag blocks network access but does +not protect against credential theft, process/IPC visibility, or +persistent backdoors. + +Build isolation wraps each build backend invocation in a sandbox that +combines file-level credential protection with OS-level namespace +isolation. Only the PEP 517 hook calls are sandboxed; download, +installation, and upload steps run normally. + +## Goals + +- A `--build-isolation/--no-build-isolation` CLI flag (default off) + that supersedes `--network-isolation` for build steps +- Credential protection: build processes cannot read `.netrc` or + other root-owned credential files +- Network isolation: no routing in the build namespace +- Process isolation: build cannot see or signal other processes +- IPC isolation: separate shared memory, semaphores, message queues +- Persistence protection: build cannot drop `.pth` backdoors, modify + shell profiles, or leave background daemons running after the build +- Environment scrubbing: downstream build systems can strip sensitive + environment variables via `FROMAGER_SCRUB_ENV_VARS` +- Works in unprivileged containers (Podman/Docker) without + `--privileged` or `--cap-add SYS_ADMIN` +- Minimal overhead (< 50ms per build invocation) + +## Non-goals + +- **Mount namespace isolation.** Mounting tmpfs over `$HOME` or + making `/usr` read-only was explored but abandoned. The + `pyproject_hooks` library creates temporary files in `/tmp` for + IPC between the parent process and the build backend + (`input.json`/`output.json`). A mount namespace with a fresh + `/tmp` hides these files and breaks the build. Bind-mounting the + specific IPC directory is fragile and couples fromager to + `pyproject_hooks` internals. +- **bubblewrap (bwrap).** bwrap provides stronger filesystem + isolation but requires `CAP_SYS_ADMIN` or a privileged container, + which is unavailable in the standard unprivileged Podman/Docker + build environment. +- **Hardcoded list of sensitive environment variables.** Fromager is + an upstream tool; the specific variables that are sensitive depend + on the downstream build system. Scrubbing is controlled entirely + by the deployer via `FROMAGER_SCRUB_ENV_VARS`. +- **macOS / Windows support.** Linux namespaces and `unshare` are + Linux-only. The flag is unavailable on other platforms. + +## How + +### Isolation mechanism + +Build isolation combines two complementary techniques: + +#### 1. Ephemeral Unix user + +Before each build invocation, the isolation script creates a +short-lived system user with `useradd` and removes it with `userdel` +on exit (via `trap EXIT`). The user has: + +- No home directory (`-M -d /nonexistent`) +- No login shell (`-s /sbin/nologin`) +- A randomized name (`fmr_`) to avoid collisions + +This provides file-level credential protection: `.netrc` is owned by +`root:root` with mode `600`, so the ephemeral user cannot read it. +The overhead is approximately 10ms for `useradd` and 10ms for +`userdel`. + +#### 2. Linux namespaces via unshare + +After dropping to the ephemeral user with `setpriv`, the script +enters new namespaces with `unshare`: + +| Namespace | Flag | Purpose | +| -- | -- | -- | +| Network | `--net` | No routing; blocks all network access | +| PID | `--pid --fork` | Build sees only its own processes | +| IPC | `--ipc` | Isolated shared memory and semaphores | +| UTS | `--uts` | Separate hostname | + +`--map-root-user` maps the ephemeral user to UID 0 inside the +namespace, giving it enough privilege to bring up the loopback +interface and set the hostname without requiring real root. + +#### Why setpriv instead of runuser + +`runuser` calls `setgroups()`, which is denied inside user namespaces +(the kernel blocks it to prevent group membership escalation). +`setpriv --reuid --regid --clear-groups` avoids this call entirely. + +#### Order of operations + +``` +useradd fmr_ # create ephemeral user (outside namespace) + └─ setpriv --reuid --regid # drop to ephemeral user + └─ unshare --uts --net --pid --ipc --fork --map-root-user + ├─ ip link set lo up + ├─ hostname localhost + └─ exec +userdel fmr_ # cleanup (trap EXIT) +``` + +The user is created before entering the namespace because `useradd` +needs access to `/etc/passwd` and `/etc/shadow` on the real +filesystem. `setpriv` drops privileges before `unshare` so the UID +switch happens outside the namespace where the real UID is mapped. + +### Environment variable scrubbing + +Downstream build systems may have sensitive environment variables +(registry tokens, CI credentials) that should not be visible to +build backends. Rather than hardcoding a list in fromager, scrubbing +is controlled by the deployer: + +```bash +# In the container image or CI environment +export FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD,CI_JOB_TOKEN" +``` + +When `--build-isolation` is active, `external_commands.run()` reads +this comma-separated list and removes the named variables from the +subprocess environment before invoking the build. + +### Integration points + +#### CLI (`__main__.py`) + +- Build isolation availability is detected at import time (same + pattern as network isolation) +- `--build-isolation/--no-build-isolation` option on the `main` + group, stored on `WorkContext` +- Fails early with a clear message if the platform does not support + build isolation + +#### WorkContext (`context.py`) + +- New `build_isolation: bool` field (default `False`) + +#### BuildEnvironment (`build_environment.py`) + +- `run()` method accepts `build_isolation` parameter, defaults to + `ctx.build_isolation` +- `install()` method explicitly passes `build_isolation=False` + because dependency installation needs access to the local PyPI + mirror + +#### Build backend hooks (`dependencies.py`) + +- `_run_hook_with_extra_environ` passes `ctx.build_isolation` to + `build_env.run()` + +#### Subprocess runner (`external_commands.py`) + +- `run()` accepts `build_isolation: bool` parameter +- When active, prepends the isolation script to the command, + sets `FROMAGER_BUILD_DIR` so the script can `chmod` the build + directory for the ephemeral user, applies env scrubbing, and sets + `CARGO_NET_OFFLINE=true` +- Build isolation supersedes network isolation but reuses the + `NetworkIsolationError` detection for consistent error reporting + +### What is and is not isolated + +| Aspect | Protected | Notes | +| -- | -- | -- | +| `.netrc` / credentials | Yes | Ephemeral user cannot read root:root 600 files | +| Network access | Yes | No routing in network namespace | +| Process visibility | Yes | PID namespace; only build processes visible | +| IPC (shm, semaphores) | Yes | IPC namespace | +| Env var leakage | Configurable | Via `FROMAGER_SCRUB_ENV_VARS` | +| `.pth` / shell profile backdoors | Yes | Ephemeral user cannot write to site-packages or home directory | +| Persistent background process | Yes | PID namespace kills all processes when the build exits | +| `/tmp` cross-build leakage | Partial | Sticky bit prevents cross-user access; no mount namespace | +| Filesystem write access | No | Ephemeral user has world-writable access to build dir | +| Trojan in build output | No | Malicious code in the built wheel is not detected | + +### Compatibility + +Works in unprivileged Podman and Docker containers without +`--privileged` or `--cap-add SYS_ADMIN`. Docker's default seccomp +profile may block `unshare`; Podman's policy allows it. On Ubuntu +24.04, `sysctl kernel.apparmor_restrict_unprivileged_userns=0` is +required. + +## Examples + +```bash +# Build with full isolation +fromager --build-isolation bootstrap -r requirements.txt + +# Build with isolation and env scrubbing +FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD" \ + fromager --build-isolation bootstrap -r requirements.txt +``` + +## Findings + +A proof-of-concept package +([build-attack-test](https://github.com/pavank63/build-attack-test)) +was used to validate the attack surface. It runs security probes from +`setup.py` during `build_sdist` / `build_wheel` to test what a +malicious build backend can access. Testing was performed with +`--network-isolation` enabled. + +### Results without build isolation + +| Attack vector | Result | Risk | +| -- | -- | -- | +| Credential file access (`.netrc`) | **Vulnerable** | Build process can read credential files containing auth tokens | +| Sensitive environment variables | **Vulnerable** | Build system variables (registry paths, tokens) visible to backends | +| Network access | Blocked | Already mitigated by `--network-isolation` | +| Process visibility (PID) | **Vulnerable** | Build can see all running processes including fromager, parallel builds, and their command-line arguments | +| IPC (shared memory, semaphores) | **Vulnerable** | Build can see and potentially attach to shared memory segments from other processes | +| Hostname | **Vulnerable** | Real hostname visible, leaks build infrastructure identity | +| Build cache read/write | **Vulnerable** | Build can read and write to shared compiler caches like ccache and cargo, enabling cache poisoning | +| Package settings files | **Vulnerable** | Build can read all package override configuration files | +| Persistent background process | **Vulnerable** | Build can spawn a daemon that continues running after the build finishes | +| Python `.pth` backdoor | **Vulnerable** | Build can drop a `.pth` file into site-packages that runs code on every Python startup | +| Shell profile injection | **Vulnerable** | Build can append to `.bashrc` / `.profile` to run code on every shell login | +| pip config poisoning | **Vulnerable** | Build can write `pip.conf` to redirect dependency installs to an attacker-controlled index | + +### Key takeaways + +1. **Network isolation alone is insufficient.** A build can steal + credentials from `.netrc` and embed them in the built wheel. The + credentials leave the build system when the wheel is distributed, + bypassing network controls entirely. + +2. **Builds can leave persistent backdoors.** `.pth` files, shell + profile entries, pip config changes, and background daemons all + survive the build and can compromise subsequent builds or the + host. + +3. **Build cache poisoning is possible.** A poisoned compiler cache + entry (ccache, cargo) can inject malicious code into future + builds of unrelated packages. + +### Supply-chain amplification + +The persistence attacks above are especially dangerous because +fromager builds many packages sequentially in the same environment. +A single malicious package built early in the bootstrap can +compromise every package built after it: + +- A `.pth` file dropped into site-packages runs on every subsequent + Python invocation, including fromager building the next package. + It can silently modify source files or inject code into build + outputs. +- A poisoned `pip.conf` redirects dependency installs for all + subsequent builds to an attacker-controlled index. +- A poisoned compiler cache entry (ccache/cargo) injects malicious + code into any later package that compiles the same source file. +- A background daemon can watch the build directory and modify + source code for the next package before its build starts. + +The published wheels for those downstream packages would contain +the injected code even though their source is clean. + +Build isolation breaks this chain. Each build runs as a separate +ephemeral user in its own PID, IPC, and network namespace, so it +cannot write to site-packages, modify pip config, poison caches, +or leave daemons behind. When fromager runs parallel builds, each +gets its own ephemeral user (`fmr_`) and its own set of +namespaces — parallel builds cannot see or interfere with each +other. + +### Remaining gaps + +Build cache poisoning and package settings access are **not fully +addressed** by this proposal, as the ephemeral user still needs +write access to the build directory. Addressing these would require +mount namespace isolation, which is incompatible with the current +`pyproject_hooks` IPC mechanism (see Non-goals). diff --git a/docs/proposals/index.rst b/docs/proposals/index.rst index 75c368b0..f7451eb9 100644 --- a/docs/proposals/index.rst +++ b/docs/proposals/index.rst @@ -4,6 +4,7 @@ Fromager Enhancement Proposals .. toctree:: :maxdepth: 1 + build-isolation new-patcher-config new-resolver-config release-cooldown