Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
300 changes: 300 additions & 0 deletions docs/proposals/build-isolation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
# Build isolation for sandboxing build backends

- Author: Pavan Kalyan Reddy Cherupally
- Created: 2026-04-21
- Status: Open
- Issue: [#1019](https://github.com/python-wheel-build/fromager/issues/1019)

## What

A `--build-isolation` flag that sandboxes PEP 517 build backend
subprocesses (`build_sdist`, `build_wheel`) so they cannot read
credentials, access the network, or interfere with the host system.

## Why

Fromager executes upstream-controlled code (setup.py, build backends)
during wheel builds. A compromised or malicious package can:

- Read credential files like `$HOME/.netrc` and exfiltrate tokens
- Access sensitive environment variables (registry keys, API tokens)
- Reach the network to upload stolen data or download payloads
- Signal or inspect other processes via `/proc` or shared IPC
- Interfere with parallel builds through shared `/tmp`
- Leave persistent backdoors: `.pth` files that run on every Python
startup, shell profile entries that run on every login, or
background daemons that survive the build

The existing `--network-isolation` flag blocks network access but does
not protect against credential theft, process/IPC visibility, or
persistent backdoors.

Build isolation wraps each build backend invocation in a sandbox that
combines file-level credential protection with OS-level namespace
isolation. Only the PEP 517 hook calls are sandboxed; download,
installation, and upload steps run normally.

## Goals

- A `--build-isolation/--no-build-isolation` CLI flag (default off)
that supersedes `--network-isolation` for build steps
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification question: What happens with these combinations?

    • --build-isolation --network-isolation — redundant? Does build isolation absorb network isolation for build steps while network isolation still applies to non-build steps?
    • --build-isolation --no-network-isolation — does the user get network isolation for builds anyway (since build isolation includes it)?
    • --no-build-isolation --network-isolation — today's behavior?

Looking at the current code, network_isolation is passed to _run_hook_with_extra_environ for build hooks but also to _createenv for venv creation. Does build isolation apply to venv creation too, or
only PEP 517 hooks?

- Credential protection: build processes cannot read `.netrc` or
other root-owned credential files
- Network isolation: no routing in the build namespace
- Process isolation: build cannot see or signal other processes
- IPC isolation: separate shared memory, semaphores, message queues
- Persistence protection: build cannot drop `.pth` backdoors, modify
shell profiles, or leave background daemons running after the build
- Environment scrubbing: downstream build systems can strip sensitive
environment variables via `FROMAGER_SCRUB_ENV_VARS`
- Works in unprivileged containers (Podman/Docker) without
`--privileged` or `--cap-add SYS_ADMIN`
- Minimal overhead (< 50ms per build invocation)

## Non-goals

- **Mount namespace isolation.** Mounting tmpfs over `$HOME` or
making `/usr` read-only was explored but abandoned. The
`pyproject_hooks` library creates temporary files in `/tmp` for
IPC between the parent process and the build backend
(`input.json`/`output.json`). A mount namespace with a fresh
`/tmp` hides these files and breaks the build. Bind-mounting the
specific IPC directory is fragile and couples fromager to
`pyproject_hooks` internals.
- **bubblewrap (bwrap).** bwrap provides stronger filesystem
isolation but requires `CAP_SYS_ADMIN` or a privileged container,
which is unavailable in the standard unprivileged Podman/Docker
build environment.
- **Hardcoded list of sensitive environment variables.** Fromager is
an upstream tool; the specific variables that are sensitive depend
on the downstream build system. Scrubbing is controlled entirely
by the deployer via `FROMAGER_SCRUB_ENV_VARS`.
- **macOS / Windows support.** Linux namespaces and `unshare` are
Linux-only. The flag is unavailable on other platforms.

## How

### Isolation mechanism

Build isolation combines two complementary techniques:

#### 1. Ephemeral Unix user

Before each build invocation, the isolation script creates a
short-lived system user with `useradd` and removes it with `userdel`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useradd / userdel, can modify /etc/passwd and /etc/shadow from what I know. This means fromager (or the isolation script) must run as root inside the container. That's a major assumption that isn't mentioned anywhere. What happens if fromager runs as a non-root user?

on exit (via `trap EXIT`). The user has:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The userdel runs in a trap EXIT handler, but SIGKILL cannot be trapped. If a build gets OOM-killed or force-killed, the ephemeral user leaks. Over a long bootstrap run with hundreds of packages, this could accumulate orphaned fmr_* users in /etc/passwd.

Can we add that as a limitation?


- No home directory (`-M -d /nonexistent`)
- No login shell (`-s /sbin/nologin`)
- A randomized name (`fmr_<random>`) to avoid collisions

This provides file-level credential protection: `.netrc` is owned by
`root:root` with mode `600`, so the ephemeral user cannot read it.
The overhead is approximately 10ms for `useradd` and 10ms for
`userdel`.

#### 2. Linux namespaces via unshare

After dropping to the ephemeral user with `setpriv`, the script
enters new namespaces with `unshare`:

| Namespace | Flag | Purpose |
| -- | -- | -- |
| Network | `--net` | No routing; blocks all network access |
| PID | `--pid --fork` | Build sees only its own processes |
| IPC | `--ipc` | Isolated shared memory and semaphores |
| UTS | `--uts` | Separate hostname |

`--map-root-user` maps the ephemeral user to UID 0 inside the
namespace, giving it enough privilege to bring up the loopback
interface and set the hostname without requiring real root.

#### Why setpriv instead of runuser

`runuser` calls `setgroups()`, which is denied inside user namespaces
(the kernel blocks it to prevent group membership escalation).
`setpriv --reuid --regid --clear-groups` avoids this call entirely.

#### Order of operations

```
useradd fmr_<random> # create ephemeral user (outside namespace)
└─ setpriv --reuid --regid # drop to ephemeral user
└─ unshare --uts --net --pid --ipc --fork --map-root-user
├─ ip link set lo up
├─ hostname localhost
└─ exec <build command>
userdel fmr_<random> # cleanup (trap EXIT)
```

The user is created before entering the namespace because `useradd`
needs access to `/etc/passwd` and `/etc/shadow` on the real
filesystem. `setpriv` drops privileges before `unshare` so the UID
switch happens outside the namespace where the real UID is mapped.

### Environment variable scrubbing

Downstream build systems may have sensitive environment variables
(registry tokens, CI credentials) that should not be visible to
build backends. Rather than hardcoding a list in fromager, scrubbing
is controlled by the deployer:

```bash
# In the container image or CI environment
export FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD,CI_JOB_TOKEN"
```

When `--build-isolation` is active, `external_commands.run()` reads
this comma-separated list and removes the named variables from the
subprocess environment before invoking the build.

### Integration points

#### CLI (`__main__.py`)

- Build isolation availability is detected at import time (same
pattern as network isolation)
- `--build-isolation/--no-build-isolation` option on the `main`
group, stored on `WorkContext`
- Fails early with a clear message if the platform does not support
build isolation

#### WorkContext (`context.py`)

- New `build_isolation: bool` field (default `False`)

#### BuildEnvironment (`build_environment.py`)

- `run()` method accepts `build_isolation` parameter, defaults to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. Looking at the actual code in dependencies.py:547-553, _run_hook_with_extra_environ calls external_commands.run() directly — it doesn't go
through BuildEnvironment.run(). This matters because BuildEnvironment.run() is where env var setup (like CARGO_NET_OFFLINE) happens. The proposal needs to either change the hook runner to go through BuildEnvironment.run(), or duplicate that logic.

`ctx.build_isolation`
- `install()` method explicitly passes `build_isolation=False`
because dependency installation needs access to the local PyPI
mirror

#### Build backend hooks (`dependencies.py`)

- `_run_hook_with_extra_environ` passes `ctx.build_isolation` to
`build_env.run()`

#### Subprocess runner (`external_commands.py`)

- `run()` accepts `build_isolation: bool` parameter
- When active, prepends the isolation script to the command,
sets `FROMAGER_BUILD_DIR` so the script can `chmod` the build
directory for the ephemeral user, applies env scrubbing, and sets
`CARGO_NET_OFFLINE=true`
- Build isolation supersedes network isolation but reuses the
`NetworkIsolationError` detection for consistent error reporting

### What is and is not isolated

| Aspect | Protected | Notes |
| -- | -- | -- |
| `.netrc` / credentials | Yes | Ephemeral user cannot read root:root 600 files |
| Network access | Yes | No routing in network namespace |
| Process visibility | Yes | PID namespace; only build processes visible |
| IPC (shm, semaphores) | Yes | IPC namespace |
| Env var leakage | Configurable | Via `FROMAGER_SCRUB_ENV_VARS` |
| `.pth` / shell profile backdoors | Yes | Ephemeral user cannot write to site-packages or home directory |
| Persistent background process | Yes | PID namespace kills all processes when the build exits |
| `/tmp` cross-build leakage | Partial | Sticky bit prevents cross-user access; no mount namespace |
| Filesystem write access | No | Ephemeral user has world-writable access to build dir |
| Trojan in build output | No | Malicious code in the built wheel is not detected |

### Compatibility

Works in unprivileged Podman and Docker containers without
`--privileged` or `--cap-add SYS_ADMIN`. Docker's default seccomp
profile may block `unshare`; Podman's policy allows it. On Ubuntu
24.04, `sysctl kernel.apparmor_restrict_unprivileged_userns=0` is
required.
Comment thread
coderabbitai[bot] marked this conversation as resolved.

## Examples

```bash
# Build with full isolation
fromager --build-isolation bootstrap -r requirements.txt

# Build with isolation and env scrubbing
FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD" \
fromager --build-isolation bootstrap -r requirements.txt
```

## Findings

A proof-of-concept package
([build-attack-test](https://github.com/pavank63/build-attack-test))
was used to validate the attack surface. It runs security probes from
`setup.py` during `build_sdist` / `build_wheel` to test what a
malicious build backend can access. Testing was performed with
`--network-isolation` enabled.

### Results without build isolation

| Attack vector | Result | Risk |
| -- | -- | -- |
| Credential file access (`.netrc`) | **Vulnerable** | Build process can read credential files containing auth tokens |
| Sensitive environment variables | **Vulnerable** | Build system variables (registry paths, tokens) visible to backends |
| Network access | Blocked | Already mitigated by `--network-isolation` |
| Process visibility (PID) | **Vulnerable** | Build can see all running processes including fromager, parallel builds, and their command-line arguments |
| IPC (shared memory, semaphores) | **Vulnerable** | Build can see and potentially attach to shared memory segments from other processes |
| Hostname | **Vulnerable** | Real hostname visible, leaks build infrastructure identity |
| Build cache read/write | **Vulnerable** | Build can read and write to shared compiler caches like ccache and cargo, enabling cache poisoning |
| Package settings files | **Vulnerable** | Build can read all package override configuration files |
| Persistent background process | **Vulnerable** | Build can spawn a daemon that continues running after the build finishes |
| Python `.pth` backdoor | **Vulnerable** | Build can drop a `.pth` file into site-packages that runs code on every Python startup |
| Shell profile injection | **Vulnerable** | Build can append to `.bashrc` / `.profile` to run code on every shell login |
| pip config poisoning | **Vulnerable** | Build can write `pip.conf` to redirect dependency installs to an attacker-controlled index |

### Key takeaways

1. **Network isolation alone is insufficient.** A build can steal
credentials from `.netrc` and embed them in the built wheel. The
credentials leave the build system when the wheel is distributed,
bypassing network controls entirely.

2. **Builds can leave persistent backdoors.** `.pth` files, shell
profile entries, pip config changes, and background daemons all
survive the build and can compromise subsequent builds or the
host.

3. **Build cache poisoning is possible.** A poisoned compiler cache
entry (ccache, cargo) can inject malicious code into future
builds of unrelated packages.

### Supply-chain amplification

The persistence attacks above are especially dangerous because
fromager builds many packages sequentially in the same environment.
A single malicious package built early in the bootstrap can
compromise every package built after it:

- A `.pth` file dropped into site-packages runs on every subsequent
Python invocation, including fromager building the next package.
It can silently modify source files or inject code into build
outputs.
- A poisoned `pip.conf` redirects dependency installs for all
subsequent builds to an attacker-controlled index.
- A poisoned compiler cache entry (ccache/cargo) injects malicious
code into any later package that compiles the same source file.
- A background daemon can watch the build directory and modify
source code for the next package before its build starts.

The published wheels for those downstream packages would contain
the injected code even though their source is clean.

Build isolation breaks this chain. Each build runs as a separate
ephemeral user in its own PID, IPC, and network namespace, so it
cannot write to site-packages, modify pip config, poison caches,
or leave daemons behind. When fromager runs parallel builds, each
gets its own ephemeral user (`fmr_<random>`) and its own set of
namespaces — parallel builds cannot see or interfere with each
other.

### Remaining gaps

Build cache poisoning and package settings access are **not fully
addressed** by this proposal, as the ephemeral user still needs
write access to the build directory. Addressing these would require
mount namespace isolation, which is incompatible with the current
`pyproject_hooks` IPC mechanism (see Non-goals).
1 change: 1 addition & 0 deletions docs/proposals/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Fromager Enhancement Proposals
.. toctree::
:maxdepth: 1

build-isolation
new-patcher-config
new-resolver-config
release-cooldown
Loading