Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,42 @@ CO-RE relocation, map/program/attach support), not a heuristic. Each run leaves
per-target evidence — `serial.log` (the guest kernel boot), `qemu.log`, and
`validator-result.json`.

### Validate via your own loader (command mode)

The bundled validator answers "does this `.bpf.o` load/attach?" Sometimes you
want to answer "does **my project's actual loader** come up on this kernel?" —
which also exercises your userspace path and needs no manifest kept in sync with
that loader. Command mode does exactly that: it runs a command (optionally a
binary you ship in) **inside each matrix kernel VM**, and the per-kernel verdict
is its exit code.

```bash
# Run your statically-built loader across the matrix; pass == exit 0 per kernel.
bpfcompat test --command '$BPFCOMPAT_BIN --self-test' \
--command-binary ./build/myloader --matrix matrices/mvp.yaml --out report.json

# Or drive an already-installed tool against a shipped .bpf.o.
bpfcompat test --command '$BPFCOMPAT_BIN --obj $BPFCOMPAT_ARTIFACT' \
--command-binary ./build/loader --artifact ./build/probe.bpf.o \
--matrix matrices/mvp.yaml --out report.json
```

The command runs as root in the disposable guest with `$BPFCOMPAT_BIN` (your
shipped binary), `$BPFCOMPAT_ARTIFACT` (a staged `.bpf.o`, if given), and
`$BPFCOMPAT_REMOTE_ROOT` exported. See
[docs/command-validation.md](docs/command-validation.md).

Point either flow at the **library of known-tricky vendor kernels** — the ones
where "version ≠ feature support" bites (ring-buffer boundary, enterprise
backports, no-BTF, vendor rebases, variant bands):

```bash
bpfcompat test --command '$BPFCOMPAT_BIN --self-test' --command-binary ./build/loader \
--matrix matrices/quirk-library.yaml --out report.json
```

See [docs/kernel-quirk-library.md](docs/kernel-quirk-library.md).

### Distributions covered

A curated, multi-distro, multi-architecture matrix of the kernels enterprises
Expand Down Expand Up @@ -604,6 +640,8 @@ User guide — start here:
- [`docs/architecture.md`](docs/architecture.md)
- [`docs/project-compatibility-suite.md`](docs/project-compatibility-suite.md) — suites and collection matrices
- [`docs/validator.md`](docs/validator.md) — what the in-guest validator checks
- [`docs/command-validation.md`](docs/command-validation.md) — validate via your own loader binary/command (exit-code verdict)
- [`docs/kernel-quirk-library.md`](docs/kernel-quirk-library.md) — curated library of known-tricky vendor kernels (version ≠ feature support)
- [`docs/profile-catalog.md`](docs/profile-catalog.md) — kernel/distro profiles and image maintenance
- [`docs/image-pipeline.md`](docs/image-pipeline.md) — where images come from, integrity, adding profiles
- [`docs/upstream-kernel-virtme-ng.md`](docs/upstream-kernel-virtme-ng.md)
Expand Down
6 changes: 5 additions & 1 deletion cmd/bpfcompat/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,9 @@ func runTest(args []string) int {

var cfg runner.Config
fs.StringVar(&cfg.ArtifactPath, "artifact", "", "Compiled .bpf.o artifact: a local ELF file, an OCI gadget (registry ref e.g. ghcr.io/org/gadget:tag), or an OCI image archive/layout")
fs.StringVar(&cfg.Command, "command", "", "Command-mode validation: shell command run inside each kernel VM (verdict = exit code). Exposes $BPFCOMPAT_BIN/$BPFCOMPAT_ARTIFACT. Use instead of --artifact to validate via a real loader binary/command.")
fs.StringVar(&cfg.CommandBinary, "command-binary", "", "Command mode: local executable shipped into each guest and exposed to --command as $BPFCOMPAT_BIN")
fs.IntVar(&cfg.CommandExpectExit, "command-expect-exit", 0, "Command mode: exit code that counts as a pass (default 0)")
fs.StringVar(&cfg.ArtifactURI, "artifact-uri", "", "Optional remote URI for artifact retrieval metadata (http|https|file)")
fs.StringVar(&cfg.ArtifactName, "artifact-name", "", "Logical artifact family name for version history (optional)")
fs.StringVar(&cfg.ArtifactVersion, "artifact-version", "", "Artifact version label for version history (optional)")
Expand All @@ -182,7 +185,7 @@ func runTest(args []string) int {
keepVMOnFailure := fs.Bool("keep-vm-on-failure", false, "Keep VM overlays/logs on failure")

fs.Usage = func() {
fmt.Fprintf(fs.Output(), "Usage:\n bpfcompat test --artifact <file> --matrix <file> --out <file> [flags]\n\n")
fmt.Fprintf(fs.Output(), "Usage:\n bpfcompat test --artifact <file> --matrix <file> --out <file> [flags]\n bpfcompat test --command <cmd> --matrix <file> --out <file> [--command-binary <file>] [flags]\n\n")
fs.PrintDefaults()
}

Expand Down Expand Up @@ -1353,6 +1356,7 @@ func printRootUsage() {
fmt.Println()
fmt.Println("Usage:")
fmt.Println(" bpfcompat test --artifact <file> --matrix <file> --out <file> [flags]")
fmt.Println(" bpfcompat test --command <cmd> --matrix <file> --out <file> [--command-binary <file>] [flags]")
fmt.Println(" bpfcompat suite --suite <file> --out <file> [flags]")
fmt.Println(" bpfcompat profile list --matrix <file>")
fmt.Println(" bpfcompat history list [flags]")
Expand Down
99 changes: 99 additions & 0 deletions docs/command-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Command-mode validation (validate via a binary/command)

The default `bpfcompat test` flow ships a `.bpf.o` plus the bundled C/libbpf
validator into each kernel VM and answers *"does this object load and attach?"*.

**Command mode** answers a different question: *"does my project's own loader —
the real userspace path — come up on this kernel?"* Instead of the bundled
validator, it runs a command (optionally a binary you ship into the guest)
inside each matrix-kernel VM, and the per-kernel verdict is the command's **exit
code**.

This is useful when:

- you want to exercise the **userspace loader path**, not just the kernel's
acceptance of the object;
- you'd rather **not maintain a manifest** (map fixups, program-variant groups)
that has to stay in sync with how your loader configures the object — your
loader already encodes that;
- your artifact isn't a single `.bpf.o` (multiple objects, skeletons, a CLI that
loads several programs).

It is the analog of running your binary under a per-kernel VM harness (e.g.
`vimto exec`), wired into the same multi-distro matrix, evidence, and history
that the `.bpf.o` flow uses.

## Usage

```bash
# Ship a statically-linked loader and run it on every matrix kernel.
# Pass == exit 0 (override with --command-expect-exit N).
bpfcompat test \
--command '$BPFCOMPAT_BIN --self-test' \
--command-binary ./build/myloader \
--matrix matrices/mvp.yaml \
--out report.json
```

```bash
# Drive a loader against a shipped .bpf.o (both are staged into the guest).
bpfcompat test \
--command '$BPFCOMPAT_BIN --obj $BPFCOMPAT_ARTIFACT' \
--command-binary ./build/loader \
--artifact ./build/probe.bpf.o \
--matrix matrices/mvp.yaml \
--out report.json
```

```bash
# No shipped binary — use a tool already present in the guest image.
bpfcompat test \
--command 'bpftool prog load /tmp/x.bpf.o /sys/fs/bpf/x' \
--command-binary ./build/x.bpf.o-copier ... # (or stage via --artifact)
```

### Flags

| Flag | Meaning |
|---|---|
| `--command <cmd>` | Shell command run inside each kernel VM. Required to enter command mode. |
| `--command-binary <file>` | Local executable shipped into each guest, `chmod +x`, exposed as `$BPFCOMPAT_BIN`. |
| `--command-expect-exit <N>` | Exit code that counts as a pass (default `0`). |
| `--artifact <file>` | Optional in command mode; when given it is staged and exposed as `$BPFCOMPAT_ARTIFACT`. |

### Environment available to the command

The command runs **as root** inside the disposable guest with:

- `BPFCOMPAT_BIN` — absolute path to the `--command-binary` you shipped (empty if none);
- `BPFCOMPAT_ARTIFACT` — absolute path to the staged `--artifact` (empty if none);
- `BPFCOMPAT_REMOTE_ROOT` — the per-run scratch root inside the guest.

The command string is executed as a single `bash -lc` operand (it is
shell-quoted, so it cannot break out to inject host-side syntax). Use real shell
inside it freely: pipes, `&&`, redirects.

## Verdict and report

- The kernel **passes** iff the command exits with `--command-expect-exit`
(default `0`); otherwise it **fails** with classification
`COMMAND_VALIDATION_FAILURE`.
- The libbpf load/attach phase is **skipped** (`validation.load_status:
"skipped"`); the outcome is recorded in the report's `functional` section as a
single synthetic `command` test carrying the exit code and bounded
stdout/stderr tails.
- A command that *fails to execute at all* (VM didn't boot, SSH failed) is an
**infra error**, not a compatibility failure — exactly as in the `.bpf.o`
flow.
- The run is still recorded in artifact version history; with no `.bpf.o` the
artifact identity is content-addressed from the command string
(`command://<name>`), so `compare`/history still work.

## Scope / limitations (first cut)

- Command mode currently supports the **`vm`** runner only (the default). It is
rejected for `virtme-ng`/`firecracker`.
- The verdict is the **exit code**. Richer assertions (stdout/stderr matchers,
per-program expectations) remain available through the manifest
`functional_tests` + `--validation-mode behavior` path, which layers commands
*on top of* a `.bpf.o` load.
99 changes: 99 additions & 0 deletions docs/kernel-quirk-library.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Library of known-tricky vendor kernels

A curated catalog of real distro kernels where **"kernel version ≠ eBPF feature
support."** These are the kernels that surprise you in production: upstream
feature boundaries, enterprise backports that carry new features onto old bases,
no-BTF kernels, vendor rebases, and program-variant fallback bands.

Every entry is a kernel bpfcompat **actually boots** (real vendor cloud image in
a disposable VM) and has evidence for — not a version string we inferred from.
Run the whole library against a `.bpf.o` *or* your own loader (command mode):

```bash
# A compiled object, the default validator:
bpfcompat test --artifact build/probe.bpf.o \
--matrix matrices/quirk-library.yaml --out report.json --markdown report.md

# Your own loader binary/command (see docs/command-validation.md):
bpfcompat test --command '$BPFCOMPAT_BIN --self-test' --command-binary ./build/loader \
--matrix matrices/quirk-library.yaml --out report.json
```

The matrix is [`matrices/quirk-library.yaml`](../matrices/quirk-library.yaml).
Profiles whose pass/fail is artifact-dependent (genuine feature boundaries) are
`required: false`, so the library never forces a verdict the kernel itself
decides.

## The catalog

The "verdict" column below is the **observed** result of running a ring-buffer
probe (`examples/ringbuf-modern/ringbuf_modern.bpf.o`) across the library — see
[Fresh evidence](#fresh-evidence-2026-06-29).

| Profile | Real kernel | The quirk | Ring-buffer probe |
|---|---|---|---|
| `ubuntu-20.04-5.4` | 5.4.0-216 | Ring-buffer maps land upstream in **5.8** — not present here | ❌ `UNSUPPORTED_MAP_TYPE` (high) — *correct*, not a bug |
| `ubuntu-20.10-5.8` | 5.8.0-63 | First upstream kernel **with** ring buffer | ✅ pass — the other side of the line |
| `almalinux-8-4.18` | 4.18.0-553.el8 | **Version lies:** RHEL backports ring buffer onto 4.18, so it works on a kernel numbered *older* than the 5.4 that failed | ✅ **pass despite 4.18** |
| `rocky-8-4.18` | 4.18.0-…el8 | Same RHEL-8 backport base (ABI-compatible rebuild) | ✅ pass |
| `centos-stream-9-5.14` | 5.14.0-706.el9 | RHEL-9 base: 5.14 carrying many 6.x BPF features | ✅ pass |
| `amazon-linux-2-4.14` | 4.14.26-…amzn2 | **Backports are not uniform:** Amazon's 4.14 (no embedded BTF) does **not** carry the ring-buffer backport that RHEL's 4.18 does | ❌ `UNSUPPORTED_MAP_TYPE` — a *simple* program loads here, ring buffer does not |
| `amazon-linux-2-5.10` | 5.10.247-…amzn2 | Amazon backport tier | ✅ pass |
| `oracle-linux-9-uek7-5.15` | **6.12.0**-…el9uek | **Version-string trap:** the `uek7-5.15` profile actually boots a **6.12** UEK kernel on an EL9 userspace | ✅ pass (test, don't assume from the name) |
| `opensuse-leap-15.6-6.4` | 6.4.0-…default | SUSE backport tier | ✅ pass |
| `ubuntu-22.04-5.15` | 5.15.0-173 | **Program-variant fallback band:** a loader must select the `*_old_x` syscall variants; `dump_task` is unsupported | ✅ pass *with the right variant selection* |
| `debian-12-6.1` | 6.1.0-47 | Newer band: `bpf_loop` variants + both BPF iterators available | ✅ pass with the modern variants |

The two ❌ rows are `required: false`: a ring-buffer probe *should* be rejected
there, and that rejection is the evidence, not a failure of the run.

## Fresh evidence (2026-06-29)

Run of `ringbuf_modern.bpf.o` across the library (`load_attach`, real QEMU/KVM
VMs, run `20260629T145413Z-5b261e`). The standout pair:

- **`ubuntu-20.04-5.4` ❌ vs `almalinux-8-4.18` ✅** — the *higher* version number
(5.4) fails ring buffer while the *lower* one (RHEL 4.18) passes it, because
RHEL backported the feature. Version number predicts the wrong answer.
- **`almalinux-8-4.18` ✅ vs `amazon-linux-2-4.14` ❌** — two old-numbered
enterprise kernels disagree on the *same* probe: RHEL backported ring buffer to
4.18, Amazon did not backport it to 4.14. **"Enterprise backport" is not a
blanket guarantee — it's per-vendor, per-feature, and has to be tested.**
- `oracle-linux-9-uek7-5.15` actually booted `6.12.0-…el9uek` — the UEK rebase
ran a far newer kernel than the profile name implies, and still passed.

All other kernels (5.8, 5.10, 5.14, 5.15, 6.1, 6.4) loaded and attached the
ring-buffer probe. The failures carry `classification_code:
UNSUPPORTED_MAP_TYPE` with the verifier detail (`map "events" failed to create —
Invalid argument (-22)`), so a CI gate or human gets the *why*, not just a ❌.

## Why these specific kernels

The catalog is assembled from kernels with documented, reproduced behavior:

- **Ring-buffer boundary + the backport that breaks the rule** — the 5.4 fail vs
5.8 pass vs AlmaLinux-8 *4.18* pass is the canonical "version ≠ support"
demonstration. Evidence:
[case-study-falco-modern-bpf.md](case-study-falco-modern-bpf.md),
[case-study-enterprise-kernels.md](case-study-enterprise-kernels.md).
- **No-BTF backport (Amazon Linux 2 / 4.14)** — boots and validates
`load_attach` on a real `4.14.26-54.32.amzn2`; see the enterprise case study.
- **Program-variant bands (5.15 vs 6.1+)** — which syscall variants and
iterators each kernel accepts, recorded per profile in the Falco modern_bpf
study.
- **Enterprise/rebase tier (RHEL-family, Oracle UEK, SUSE)** — the 14/14
backported-tier run in the enterprise case study.

## Scope and honesty

- These are **independent tests of public distro images**; not affiliated with or
endorsed by Red Hat, AlmaLinux, Rocky, Oracle, Amazon, or SUSE.
- "Tricky" means *behavior is not predictable from the version number*, not that
the kernel is defective. A ❌ on `ubuntu-20.04-5.4` for a ring-buffer probe is
the kernel correctly rejecting an unsupported feature.
- RHEL itself is subscription-walled; AlmaLinux/Rocky/CentOS Stream are the free,
ABI-compatible rebuilds used as the reproducible RHEL stand-in. RHCOS/OpenShift
is a separate opt-in, operator-supplied path
([rhcos-openshift.md](rhcos-openshift.md)).
- The list grows as new quirks are reproduced with evidence. It is deliberately
small and verified rather than large and inferred.
Loading
Loading