-
Notifications
You must be signed in to change notification settings - Fork 50
docs(proposal): add build isolation design for sandboxed builds #1077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,300 @@ | ||
| # Build isolation for sandboxing build backends | ||
|
|
||
| - Author: Pavan Kalyan Reddy Cherupally | ||
| - Created: 2026-04-21 | ||
| - Status: Open | ||
| - Issue: [#1019](https://github.com/python-wheel-build/fromager/issues/1019) | ||
|
|
||
| ## What | ||
|
|
||
| A `--build-isolation` flag that sandboxes PEP 517 build backend | ||
| subprocesses (`build_sdist`, `build_wheel`) so they cannot read | ||
| credentials, access the network, or interfere with the host system. | ||
|
|
||
| ## Why | ||
|
|
||
| Fromager executes upstream-controlled code (setup.py, build backends) | ||
| during wheel builds. A compromised or malicious package can: | ||
|
|
||
| - Read credential files like `$HOME/.netrc` and exfiltrate tokens | ||
| - Access sensitive environment variables (registry keys, API tokens) | ||
| - Reach the network to upload stolen data or download payloads | ||
| - Signal or inspect other processes via `/proc` or shared IPC | ||
| - Interfere with parallel builds through shared `/tmp` | ||
| - Leave persistent backdoors: `.pth` files that run on every Python | ||
| startup, shell profile entries that run on every login, or | ||
| background daemons that survive the build | ||
|
|
||
| The existing `--network-isolation` flag blocks network access but does | ||
| not protect against credential theft, process/IPC visibility, or | ||
| persistent backdoors. | ||
|
|
||
| Build isolation wraps each build backend invocation in a sandbox that | ||
| combines file-level credential protection with OS-level namespace | ||
| isolation. Only the PEP 517 hook calls are sandboxed; download, | ||
| installation, and upload steps run normally. | ||
|
|
||
| ## Goals | ||
|
|
||
| - A `--build-isolation/--no-build-isolation` CLI flag (default off) | ||
| that supersedes `--network-isolation` for build steps | ||
| - Credential protection: build processes cannot read `.netrc` or | ||
| other root-owned credential files | ||
| - Network isolation: no routing in the build namespace | ||
| - Process isolation: build cannot see or signal other processes | ||
| - IPC isolation: separate shared memory, semaphores, message queues | ||
| - Persistence protection: build cannot drop `.pth` backdoors, modify | ||
| shell profiles, or leave background daemons running after the build | ||
| - Environment scrubbing: downstream build systems can strip sensitive | ||
| environment variables via `FROMAGER_SCRUB_ENV_VARS` | ||
| - Works in unprivileged containers (Podman/Docker) without | ||
| `--privileged` or `--cap-add SYS_ADMIN` | ||
| - Minimal overhead (< 50ms per build invocation) | ||
|
|
||
| ## Non-goals | ||
|
|
||
| - **Mount namespace isolation.** Mounting tmpfs over `$HOME` or | ||
| making `/usr` read-only was explored but abandoned. The | ||
| `pyproject_hooks` library creates temporary files in `/tmp` for | ||
| IPC between the parent process and the build backend | ||
| (`input.json`/`output.json`). A mount namespace with a fresh | ||
| `/tmp` hides these files and breaks the build. Bind-mounting the | ||
| specific IPC directory is fragile and couples fromager to | ||
| `pyproject_hooks` internals. | ||
| - **bubblewrap (bwrap).** bwrap provides stronger filesystem | ||
| isolation but requires `CAP_SYS_ADMIN` or a privileged container, | ||
| which is unavailable in the standard unprivileged Podman/Docker | ||
| build environment. | ||
| - **Hardcoded list of sensitive environment variables.** Fromager is | ||
| an upstream tool; the specific variables that are sensitive depend | ||
| on the downstream build system. Scrubbing is controlled entirely | ||
| by the deployer via `FROMAGER_SCRUB_ENV_VARS`. | ||
| - **macOS / Windows support.** Linux namespaces and `unshare` are | ||
| Linux-only. The flag is unavailable on other platforms. | ||
|
|
||
| ## How | ||
|
|
||
| ### Isolation mechanism | ||
|
|
||
| Build isolation combines two complementary techniques: | ||
|
|
||
| #### 1. Ephemeral Unix user | ||
|
|
||
| Before each build invocation, the isolation script creates a | ||
| short-lived system user with `useradd` and removes it with `userdel` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| on exit (via `trap EXIT`). The user has: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Can we add that as a limitation? |
||
|
|
||
| - No home directory (`-M -d /nonexistent`) | ||
| - No login shell (`-s /sbin/nologin`) | ||
| - A randomized name (`fmr_<random>`) to avoid collisions | ||
|
|
||
| This provides file-level credential protection: `.netrc` is owned by | ||
| `root:root` with mode `600`, so the ephemeral user cannot read it. | ||
| The overhead is approximately 10ms for `useradd` and 10ms for | ||
| `userdel`. | ||
|
|
||
| #### 2. Linux namespaces via unshare | ||
|
|
||
| After dropping to the ephemeral user with `setpriv`, the script | ||
| enters new namespaces with `unshare`: | ||
|
|
||
| | Namespace | Flag | Purpose | | ||
| | -- | -- | -- | | ||
| | Network | `--net` | No routing; blocks all network access | | ||
| | PID | `--pid --fork` | Build sees only its own processes | | ||
| | IPC | `--ipc` | Isolated shared memory and semaphores | | ||
| | UTS | `--uts` | Separate hostname | | ||
|
|
||
| `--map-root-user` maps the ephemeral user to UID 0 inside the | ||
| namespace, giving it enough privilege to bring up the loopback | ||
| interface and set the hostname without requiring real root. | ||
|
|
||
| #### Why setpriv instead of runuser | ||
|
|
||
| `runuser` calls `setgroups()`, which is denied inside user namespaces | ||
| (the kernel blocks it to prevent group membership escalation). | ||
| `setpriv --reuid --regid --clear-groups` avoids this call entirely. | ||
|
|
||
| #### Order of operations | ||
|
|
||
| ``` | ||
| useradd fmr_<random> # create ephemeral user (outside namespace) | ||
| └─ setpriv --reuid --regid # drop to ephemeral user | ||
| └─ unshare --uts --net --pid --ipc --fork --map-root-user | ||
| ├─ ip link set lo up | ||
| ├─ hostname localhost | ||
| └─ exec <build command> | ||
| userdel fmr_<random> # cleanup (trap EXIT) | ||
| ``` | ||
|
|
||
| The user is created before entering the namespace because `useradd` | ||
| needs access to `/etc/passwd` and `/etc/shadow` on the real | ||
| filesystem. `setpriv` drops privileges before `unshare` so the UID | ||
| switch happens outside the namespace where the real UID is mapped. | ||
|
|
||
| ### Environment variable scrubbing | ||
|
|
||
| Downstream build systems may have sensitive environment variables | ||
| (registry tokens, CI credentials) that should not be visible to | ||
| build backends. Rather than hardcoding a list in fromager, scrubbing | ||
| is controlled by the deployer: | ||
|
|
||
| ```bash | ||
| # In the container image or CI environment | ||
| export FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD,CI_JOB_TOKEN" | ||
| ``` | ||
|
|
||
| When `--build-isolation` is active, `external_commands.run()` reads | ||
| this comma-separated list and removes the named variables from the | ||
| subprocess environment before invoking the build. | ||
|
|
||
| ### Integration points | ||
|
|
||
| #### CLI (`__main__.py`) | ||
|
|
||
| - Build isolation availability is detected at import time (same | ||
| pattern as network isolation) | ||
| - `--build-isolation/--no-build-isolation` option on the `main` | ||
| group, stored on `WorkContext` | ||
| - Fails early with a clear message if the platform does not support | ||
| build isolation | ||
|
|
||
| #### WorkContext (`context.py`) | ||
|
|
||
| - New `build_isolation: bool` field (default `False`) | ||
|
|
||
| #### BuildEnvironment (`build_environment.py`) | ||
|
|
||
| - `run()` method accepts `build_isolation` parameter, defaults to | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't look right. Looking at the actual code in dependencies.py:547-553, |
||
| `ctx.build_isolation` | ||
| - `install()` method explicitly passes `build_isolation=False` | ||
| because dependency installation needs access to the local PyPI | ||
| mirror | ||
|
|
||
| #### Build backend hooks (`dependencies.py`) | ||
|
|
||
| - `_run_hook_with_extra_environ` passes `ctx.build_isolation` to | ||
| `build_env.run()` | ||
|
|
||
| #### Subprocess runner (`external_commands.py`) | ||
|
|
||
| - `run()` accepts `build_isolation: bool` parameter | ||
| - When active, prepends the isolation script to the command, | ||
| sets `FROMAGER_BUILD_DIR` so the script can `chmod` the build | ||
| directory for the ephemeral user, applies env scrubbing, and sets | ||
| `CARGO_NET_OFFLINE=true` | ||
| - Build isolation supersedes network isolation but reuses the | ||
| `NetworkIsolationError` detection for consistent error reporting | ||
|
|
||
| ### What is and is not isolated | ||
|
|
||
| | Aspect | Protected | Notes | | ||
| | -- | -- | -- | | ||
| | `.netrc` / credentials | Yes | Ephemeral user cannot read root:root 600 files | | ||
| | Network access | Yes | No routing in network namespace | | ||
| | Process visibility | Yes | PID namespace; only build processes visible | | ||
| | IPC (shm, semaphores) | Yes | IPC namespace | | ||
| | Env var leakage | Configurable | Via `FROMAGER_SCRUB_ENV_VARS` | | ||
| | `.pth` / shell profile backdoors | Yes | Ephemeral user cannot write to site-packages or home directory | | ||
| | Persistent background process | Yes | PID namespace kills all processes when the build exits | | ||
| | `/tmp` cross-build leakage | Partial | Sticky bit prevents cross-user access; no mount namespace | | ||
| | Filesystem write access | No | Ephemeral user has world-writable access to build dir | | ||
| | Trojan in build output | No | Malicious code in the built wheel is not detected | | ||
|
|
||
| ### Compatibility | ||
|
|
||
| Works in unprivileged Podman and Docker containers without | ||
| `--privileged` or `--cap-add SYS_ADMIN`. Docker's default seccomp | ||
| profile may block `unshare`; Podman's policy allows it. On Ubuntu | ||
| 24.04, `sysctl kernel.apparmor_restrict_unprivileged_userns=0` is | ||
| required. | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| ## Examples | ||
|
|
||
| ```bash | ||
| # Build with full isolation | ||
| fromager --build-isolation bootstrap -r requirements.txt | ||
|
|
||
| # Build with isolation and env scrubbing | ||
| FROMAGER_SCRUB_ENV_VARS="NGC_API_KEY,TWINE_PASSWORD" \ | ||
| fromager --build-isolation bootstrap -r requirements.txt | ||
| ``` | ||
|
|
||
| ## Findings | ||
|
|
||
| A proof-of-concept package | ||
| ([build-attack-test](https://github.com/pavank63/build-attack-test)) | ||
| was used to validate the attack surface. It runs security probes from | ||
| `setup.py` during `build_sdist` / `build_wheel` to test what a | ||
| malicious build backend can access. Testing was performed with | ||
| `--network-isolation` enabled. | ||
|
|
||
| ### Results without build isolation | ||
|
|
||
| | Attack vector | Result | Risk | | ||
| | -- | -- | -- | | ||
| | Credential file access (`.netrc`) | **Vulnerable** | Build process can read credential files containing auth tokens | | ||
| | Sensitive environment variables | **Vulnerable** | Build system variables (registry paths, tokens) visible to backends | | ||
| | Network access | Blocked | Already mitigated by `--network-isolation` | | ||
| | Process visibility (PID) | **Vulnerable** | Build can see all running processes including fromager, parallel builds, and their command-line arguments | | ||
| | IPC (shared memory, semaphores) | **Vulnerable** | Build can see and potentially attach to shared memory segments from other processes | | ||
| | Hostname | **Vulnerable** | Real hostname visible, leaks build infrastructure identity | | ||
| | Build cache read/write | **Vulnerable** | Build can read and write to shared compiler caches like ccache and cargo, enabling cache poisoning | | ||
| | Package settings files | **Vulnerable** | Build can read all package override configuration files | | ||
| | Persistent background process | **Vulnerable** | Build can spawn a daemon that continues running after the build finishes | | ||
| | Python `.pth` backdoor | **Vulnerable** | Build can drop a `.pth` file into site-packages that runs code on every Python startup | | ||
| | Shell profile injection | **Vulnerable** | Build can append to `.bashrc` / `.profile` to run code on every shell login | | ||
| | pip config poisoning | **Vulnerable** | Build can write `pip.conf` to redirect dependency installs to an attacker-controlled index | | ||
|
|
||
| ### Key takeaways | ||
|
|
||
| 1. **Network isolation alone is insufficient.** A build can steal | ||
| credentials from `.netrc` and embed them in the built wheel. The | ||
| credentials leave the build system when the wheel is distributed, | ||
| bypassing network controls entirely. | ||
|
|
||
| 2. **Builds can leave persistent backdoors.** `.pth` files, shell | ||
| profile entries, pip config changes, and background daemons all | ||
| survive the build and can compromise subsequent builds or the | ||
| host. | ||
|
|
||
| 3. **Build cache poisoning is possible.** A poisoned compiler cache | ||
| entry (ccache, cargo) can inject malicious code into future | ||
| builds of unrelated packages. | ||
|
|
||
| ### Supply-chain amplification | ||
|
|
||
| The persistence attacks above are especially dangerous because | ||
| fromager builds many packages sequentially in the same environment. | ||
| A single malicious package built early in the bootstrap can | ||
| compromise every package built after it: | ||
|
|
||
| - A `.pth` file dropped into site-packages runs on every subsequent | ||
| Python invocation, including fromager building the next package. | ||
| It can silently modify source files or inject code into build | ||
| outputs. | ||
| - A poisoned `pip.conf` redirects dependency installs for all | ||
| subsequent builds to an attacker-controlled index. | ||
| - A poisoned compiler cache entry (ccache/cargo) injects malicious | ||
| code into any later package that compiles the same source file. | ||
| - A background daemon can watch the build directory and modify | ||
| source code for the next package before its build starts. | ||
|
|
||
| The published wheels for those downstream packages would contain | ||
| the injected code even though their source is clean. | ||
|
|
||
| Build isolation breaks this chain. Each build runs as a separate | ||
| ephemeral user in its own PID, IPC, and network namespace, so it | ||
| cannot write to site-packages, modify pip config, poison caches, | ||
| or leave daemons behind. When fromager runs parallel builds, each | ||
| gets its own ephemeral user (`fmr_<random>`) and its own set of | ||
| namespaces — parallel builds cannot see or interfere with each | ||
| other. | ||
|
|
||
| ### Remaining gaps | ||
|
|
||
| Build cache poisoning and package settings access are **not fully | ||
| addressed** by this proposal, as the ephemeral user still needs | ||
| write access to the build directory. Addressing these would require | ||
| mount namespace isolation, which is incompatible with the current | ||
| `pyproject_hooks` IPC mechanism (see Non-goals). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification question: What happens with these combinations?
Looking at the current code, network_isolation is passed to
_run_hook_with_extra_environfor build hooks but also to_createenvfor venv creation. Does build isolation apply to venv creation too, oronly PEP 517 hooks?