diff --git a/CHANGELOG.md b/CHANGELOG.md index e3c05e4..f4e8db3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,12 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) once a ## [Unreleased] ### Added +- RHCOS evidence matrix: profiles for OpenShift 4.14 / 4.16 / 4.18 (`matrices/rhcos.yaml`) + and a recorded multi-version, multi-artifact run in `docs/evidence-rhcos.md` — + baseline load and ring-buffer load+attach pass on every release (RHEL 9.2 and + 9.4 backported 5.14 kernels), and a CO-RE failure is correctly rejected on + every release (the discriminator). `make rhcos-image` takes `RHCOS_VERSION` to + stage per-version images. x86_64 only; opt-in via `BPFCOMPAT_ENABLE_RHCOS=1`. - CoreOS (Ignition) boot support. CoreOS-family images boot via Ignition, not cloud-init, so the executor now writes a minimal Ignition config (SSH key for the `core` user) and passes it to QEMU via `-fw_cfg name=opt/com.coreos/config` diff --git a/Makefile b/Makefile index aeb0b5f..58e9806 100644 --- a/Makefile +++ b/Makefile @@ -175,13 +175,15 @@ vm-image-fcos: # Stage an operator-supplied RHEL CoreOS (OpenShift) image for the rhcos-4.16 # profile. RHCOS ships with an OpenShift release, not a public cloud-image URL, -# so the operator provides it: +# so the operator provides it. RHCOS_VERSION picks the profile/cache slot +# (default 4.16); the image staged at vm/cache/rhcos-$(RHCOS_VERSION).qcow2: # make rhcos-image RHCOS_IMAGE=/path/to/rhcos-qemu.x86_64.qcow2 -# make rhcos-image RHCOS_IMAGE_URL=https://internal-mirror/rhcos.qcow2.gz +# make rhcos-image RHCOS_VERSION=4.18 RHCOS_IMAGE_URL=https://mirror/rhcos.qcow2.gz # Then run with BPFCOMPAT_ENABLE_RHCOS=1 to enable the profile. +RHCOS_VERSION ?= 4.16 rhcos-image: RHCOS_IMAGE='$(RHCOS_IMAGE)' RHCOS_IMAGE_URL='$(RHCOS_IMAGE_URL)' \ - bash vm/scripts/fetch-rhcos-image.sh vm/cache/rhcos-4.16.qcow2 + bash vm/scripts/fetch-rhcos-image.sh vm/cache/rhcos-$(RHCOS_VERSION).qcow2 vm-ubuntu-22-arm64: bash vm/scripts/fetch-cloud-image.sh \ diff --git a/README.md b/README.md index bbbfa30..d1056b7 100644 --- a/README.md +++ b/README.md @@ -114,11 +114,11 @@ different bootstrap. bpfcompat implements it (Ignition config over QEMU -matrix matrices/rhcos.yaml -runner vm -out report.json ``` - Recorded run: RHCOS `416.94…` on kernel `5.14.0-427.93.1.el9_4` (OpenShift - 4.16), ring-buffer artifact load + attach **pass** — + Recorded evidence matrix: **3 OpenShift releases (4.14 / 4.16 / 4.18)** × 3 + artifacts, real boots — baseline + ring-buffer load+attach **pass**, and a + CO-RE failure correctly **rejected** on every release — [docs/evidence-rhcos.md](docs/evidence-rhcos.md). Without an image, the **RHEL / - AlmaLinux 9 (5.14)** profiles are the interim kernel approximation (RHCOS for - 4.16 is the RHEL 9.4 kernel). Full guide: + AlmaLinux 9 (5.14)** profiles are the interim kernel approximation. Full guide: [docs/rhcos-openshift.md](docs/rhcos-openshift.md). ## Try it in CI without your own KVM box @@ -583,7 +583,7 @@ Reference matrices (real, reproducible artifacts): - [`docs/case-study-falco-modern-bpf.md`](docs/case-study-falco-modern-bpf.md) — Falco `modern_bpf` across 5 kernels - [`docs/case-study-enterprise-kernels.md`](docs/case-study-enterprise-kernels.md) — RHEL/Oracle/Amazon/SUSE backported tier - [`docs/case-study-inspektor-gadget.md`](docs/case-study-inspektor-gadget.md) — published gadgets from OCI, zero config -- [`docs/evidence-rhcos.md`](docs/evidence-rhcos.md) — RHEL CoreOS / OpenShift 4.16, load + attach inside a real RHCOS guest +- [`docs/evidence-rhcos.md`](docs/evidence-rhcos.md) — RHEL CoreOS / OpenShift 4.14·4.16·4.18 matrix, load + attach inside real RHCOS guests Internal evidence and program docs (acceptance records, runbooks, and planning notes — useful for contributors, not needed to use the tool): diff --git a/docs/evidence-rhcos.md b/docs/evidence-rhcos.md index a743ec0..37a62ce 100644 --- a/docs/evidence-rhcos.md +++ b/docs/evidence-rhcos.md @@ -1,94 +1,125 @@ -# Evidence: RHEL CoreOS (OpenShift) validation +# Evidence: RHEL CoreOS (OpenShift) validation matrix -A real validation run of a compiled eBPF artifact **inside a booted RHEL CoreOS -guest**, recorded here as committed evidence. This is the proof behind the opt-in -RHCOS path documented in [docs/rhcos-openshift.md](rhcos-openshift.md). Reproduce -it with the steps at the bottom. +Real validation runs of compiled eBPF artifacts **inside booted RHEL CoreOS +guests**, across multiple OpenShift releases. Committed as evidence behind the +opt-in RHCOS path documented in [docs/rhcos-openshift.md](rhcos-openshift.md). +Reproduce with the steps at the bottom. -> The raw run artifacts (full `report.json`, `validator-result.json`, serial log) +> Raw run artifacts (full `report.json`, `validator-result.json`, serial logs) > are written under `evidence/rhcos/` locally; that path is git-ignored as -> high-churn output, so the decisive fields are inlined below. +> high-churn output, so the decisive fields are inlined here. -## Result +## Releases under test -| Field | Value | -|---|---| -| Profile | `rhcos-4.16-5.14` | -| Booted OS | Red Hat Enterprise Linux CoreOS `416.94.202510081640-0` (OpenShift 4.16) | -| Kernel | `5.14.0-427.93.1.el9_4.x86_64` (RHEL 9.4 base, heavily backported) | -| Arch | x86_64 | -| Boot path | Ignition via QEMU `-fw_cfg name=opt/com.coreos/config`; SSH as `core` | -| Artifact | `examples/ringbuf-modern/ringbuf_modern.bpf.o` (sha256 `569df554…21728`) | -| Load | **pass** (errno 0) | -| Attach | **pass** (1/1, best-effort) | -| Kernel BTF | present (`/sys/kernel/btf/vmlinux`, 4876642 bytes) | -| Overall | **pass** | +Each row is a real RHCOS bootimage from the public OpenShift mirror, booted via +Ignition (`-fw_cfg name=opt/com.coreos/config`), SSH as `core`. The RHCOS +version encodes the RHEL base (e.g. `416.94` = OpenShift 4.16 on RHEL 9.4), and +the kernel column is the **in-guest `uname -r`** captured at run time. -The artifact uses a BPF ring buffer, upstream since **5.8**. It loads on this -**5.14** RHCOS kernel because RHEL backports the feature — "kernel version ≠ -feature support," tested by booting the real vendor kernel rather than inferred. +| OpenShift | RHCOS bootimage | RHEL base | Kernel (`uname -r`) | Kernel BTF | +|---|---|---|---|---| +| 4.14 | `414.92.202407091253` | 9.2 | `5.14.0-284.73.1.el9_2.x86_64` | present | +| 4.16 | `416.94.202510081640` | 9.4 | `5.14.0-427.93.1.el9_4.x86_64` | present | +| 4.18 | `418.94.202510081222` | 9.4 | `5.14.0-427.93.1.el9_4.x86_64` | present | -## In-guest validator output (`validator.v0.4`, key fields) +Note the OCP minor does **not** track the kernel linearly: 4.16 and 4.18 share +the RHEL 9.4 `-427` kernel, while 4.14 is RHEL 9.2 `-284`. That is exactly the +"version number predicts nothing" property bpfcompat tests by booting the real +vendor kernel. (The mirror's `4.18/latest` bootimage is RHEL-9.4-based; later +4.18 z-streams may move to 9.6 — the table records the bootimage actually run.) + +## Matrix result (3 artifacts × 3 releases, real boots) + +| Artifact | What it exercises | 4.14 | 4.16 | 4.18 | +|---|---|---|---|---| +| `simple-pass` | baseline program load | ✅ load | ✅ load | ✅ load | +| `ringbuf-modern` | BPF ring buffer (upstream ≥ 5.8) + attach | ✅ load + attach 1/1 | ✅ load + attach 1/1 | ✅ load + attach 1/1 | +| `core-relocation-fail` | CO-RE relocation to a non-existent type | ❌ **rejected** | ❌ **rejected** | ❌ **rejected** | + +Two things this proves: + +1. **Backports work, tested not inferred.** The ring buffer lands upstream in + 5.8, yet `ringbuf-modern` loads *and attaches* on RHCOS's backported 5.14 + (both RHEL 9.2 and 9.4) — because the verdict comes from the real kernel. +2. **The verdict discriminates.** `core-relocation-fail` is **rejected on every + release** with `errno -22` and classification `CORE_RELOCATION_FAILURE` — so + the passes above are real acceptances, not a rubber stamp. (Its matrix targets + are non-blocking, so they record a per-target failure without failing the run.) + +## In-guest validator output (representative — 4.16, ring buffer) ```json { "schema_version": "validator.v0.4", "status": "pass", - "host": { - "release": "5.14.0-427.93.1.el9_4.x86_64", - "version": "#1 SMP PREEMPT_DYNAMIC Wed Oct 1 11:45:46 EDT 2025", - "machine": "x86_64" - }, + "host": { "release": "5.14.0-427.93.1.el9_4.x86_64", "machine": "x86_64" }, "load": { "status": "pass", "error_code": 0, "error": "" }, "attach": { "mode": "best-effort", "status": "pass", "attempted": 1, "passed": 1, "failed": 0 }, - "btf": { "kernel_btf_available": true, "kernel_btf_size": 4876642, - "artifact_has_btf": true, "artifact_has_btf_ext": true } + "btf": { "kernel_btf_available": true, "artifact_has_btf": true } } ``` -## Guest serial console (excerpt, ANSI stripped) +Rejection record (representative — 4.14, CO-RE failure): +```json +{ + "status": "fail", + "host": { "release": "5.14.0-284.73.1.el9_2.x86_64" }, + "load": { "status": "fail", "error_code": -22 }, + "classification_code": "CORE_RELOCATION_FAILURE" +} ``` -GRUB: Booting `Red Hat Enterprise Linux CoreOS 416.94.202510081640-0 (ostree:0)' -[0.000000] Linux version 5.14.0-427.93.1.el9_4.x86_64 - (mockbuild@x86-64-03.build.eng.rdu2.redhat.com) #1 SMP PREEMPT_DYNAMIC Wed Oct 1 ... -[0.000000] Command line: ... vmlinuz-5.14.0-427.93.1.el9_4.x86_64 rw ignition.firstboot - ostree=/ostree/boot.1/rhcos/... ignition.platform.id=qemu console=ttyS0,115200n8 - -Welcome to Red Hat Enterprise Linux CoreOS 416.94.202510081640-0 - dracut-057-54.git20250423.el9_4.1 (Initramfs)! +## Guest serial console (excerpt, ANSI stripped — 4.16) +``` +GRUB: Booting `Red Hat Enterprise Linux CoreOS 416.94.202510081640-0 (ostree:0)' +[0.000000] Linux version 5.14.0-427.93.1.el9_4.x86_64 ... #1 SMP PREEMPT_DYNAMIC +[0.000000] Command line: ... ignition.firstboot ... ignition.platform.id=qemu console=ttyS0,115200n8 +Welcome to Red Hat Enterprise Linux CoreOS 416.94.202510081640-0 (Initramfs)! [1.291367] systemd[1]: Starting CoreOS Ignition User Config Setup... [ OK ] Finished CoreOS Ignition User Config Setup. ``` -`ignition.platform.id=qemu` + "CoreOS Ignition User Config Setup" confirm the -boot used the Ignition config bpfcompat delivered over `-fw_cfg`. - ## Provenance -- Image: `rhcos-4.16.51-x86_64-qemu.x86_64.qcow2.gz`, public OpenShift mirror - (`mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.16/latest/`). - Published sha256 of the `.gz`: - `92880764c1b3b61940bc209ee021b97474c4db2d9a36abcece55ddd6d8c17c95`. -- Decompressed qcow2 base-image sha256 (recorded in `report.json`): - `d03128234c5dc6217bd37ee0caf6f192107d42d39a8a6b5c9b6148b0f4f92399`. -- The pull secret is required for the OpenShift *container release payload*, not - the RHCOS boot qcow2 used here. +Images: public OpenShift mirror, +`mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos//latest/`. +The pull secret gates the container release payload, **not** these boot qcow2s. +Decompressed qcow2 sha256 (as run): -## Reproduce +| OpenShift | image | sha256 (decompressed qcow2) | +|---|---|---| +| 4.14 | `rhcos-4.14.34-x86_64-qemu.x86_64.qcow2` | `6d271daf23242570520891cc8013d7ab3e2fa5ab8ab9d37485b28b72ab61e99f` | +| 4.16 | `rhcos-4.16.51-x86_64-qemu.x86_64.qcow2` | `d03128234c5dc6217bd37ee0caf6f192107d42d39a8a6b5c9b6148b0f4f92399` | +| 4.18 | `rhcos-4.18.27-x86_64-qemu.x86_64.qcow2` | `a6f870c3fb8f5039962978980cf6a5a11cd2973a35fc2b2938106658983b18d6` | -```sh -base=https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.16/latest -URL=$(curl -fsSL "$base/sha256sum.txt" | awk '/qemu.x86_64.qcow2.gz$/{print "'"$base"'/"$2; exit}') +## Honest limits + +- **x86_64 only.** OpenShift on ARM (aarch64) is real but not covered here — it + needs an ARM64-capable KVM host and an aarch64 RHCOS bootimage. Not yet run. +- **Not in public CI.** RHCOS is operator-supplied by design (no bundled image), + so it does not run on every PR like the Ubuntu/FCOS lanes; this matrix is a + recorded, reproducible operator run. +- **Bootimage, not a live cluster.** This validates the node OS + kernel, not + OpenShift-cluster-specific MachineConfig state. -make rhcos-image RHCOS_IMAGE_URL="$URL" +## Reproduce -BPFCOMPAT_ENABLE_RHCOS=1 ./bin/bpfcompat test \ - -artifact examples/ringbuf-modern/ringbuf_modern.bpf.o \ - -matrix matrices/rhcos.yaml -runner vm -out report.json +```sh +b=https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos +for v in 4.14 4.16 4.18; do + url=$(curl -fsSL "$b/$v/latest/sha256sum.txt" | awk '/qemu.x86_64.qcow2.gz$/{print "'"$b"'/'"$v"'/latest/"$2; exit}') + make rhcos-image RHCOS_VERSION="$v" RHCOS_IMAGE_URL="$url" # → vm/cache/rhcos-$v.qcow2 +done + +for art in simple-pass/simple_pass ringbuf-modern/ringbuf_modern core-relocation-fail/core_relocation_fail; do + BPFCOMPAT_ENABLE_RHCOS=1 ./bin/bpfcompat test \ + -artifact examples/$art.bpf.o -matrix matrices/rhcos.yaml -runner vm \ + -concurrency 3 -out report-$(basename $art).json +done ``` -Enterprises with an internal mirror or an `openshift-install`-extracted image -pass `RHCOS_IMAGE=/path/to/rhcos.qcow2` instead of `RHCOS_IMAGE_URL`. +`RHCOS_VERSION` selects both the cache slot (`vm/cache/rhcos-.qcow2`) and the +matching profile in `matrices/rhcos.yaml`. `core-relocation-fail` is expected to +be rejected — that is the discriminator, not a regression. diff --git a/matrices/rhcos.yaml b/matrices/rhcos.yaml index 8b4e84d..6790195 100644 --- a/matrices/rhcos.yaml +++ b/matrices/rhcos.yaml @@ -1,7 +1,12 @@ -# RHEL CoreOS (OpenShift) — opt-in, operator-supplied image. -# Stage the image with `make rhcos-image RHCOS_IMAGE=... ` (or RHCOS_IMAGE_URL=...) -# and run with BPFCOMPAT_ENABLE_RHCOS=1. See docs/rhcos-openshift.md. +# RHEL CoreOS (OpenShift) evidence matrix — opt-in, operator-supplied images. +# Stage each image with `make rhcos-image RHCOS_IMAGE_URL=...` (one per version, +# staged at vm/cache/rhcos-.qcow2) and run with BPFCOMPAT_ENABLE_RHCOS=1. +# See docs/rhcos-openshift.md and docs/evidence-rhcos.md. name: rhcos profiles: + - id: rhcos-4.14-5.14 + required: false - id: rhcos-4.16-5.14 required: false + - id: rhcos-4.18-5.14 + required: false diff --git a/vm/profiles/rhcos-4.14-5.14.yaml b/vm/profiles/rhcos-4.14-5.14.yaml new file mode 100644 index 0000000..8041f5b --- /dev/null +++ b/vm/profiles/rhcos-4.14-5.14.yaml @@ -0,0 +1,23 @@ +# RHEL CoreOS (OpenShift 4.14) — runnable with an operator-supplied image. +# +# Part of the RHCOS evidence matrix (see docs/evidence-rhcos.md). Same Ignition +# boot path as Fedora CoreOS (internal/vm/ignition.go); off by default, enable +# with BPFCOMPAT_ENABLE_RHCOS=1 once the image is staged with +# make rhcos-image RHCOS_IMAGE_URL=.../rhcos-4.14...-qemu.x86_64.qcow2.gz \ +# RHCOS_IMAGE=/path (stage target: vm/cache/rhcos-4.14.qcow2) +# RHCOS for OpenShift 4.14 is a RHEL 9.x kernel (5.14, heavily backported); the +# real booted kernel is captured at runtime in the report. +id: rhcos-4.14-5.14 +distro: rhcos +version: "4.14" +kernel_family: "5.14" +arch: x86_64 +image: + local_path: "vm/cache/rhcos-4.14.qcow2" +boot: + memory_mb: 2048 + cpus: 2 +validator: + path: "/usr/local/bin/bpfcompat-validator" +capabilities: + expected_btf: true diff --git a/vm/profiles/rhcos-4.18-5.14.yaml b/vm/profiles/rhcos-4.18-5.14.yaml new file mode 100644 index 0000000..6ea5464 --- /dev/null +++ b/vm/profiles/rhcos-4.18-5.14.yaml @@ -0,0 +1,23 @@ +# RHEL CoreOS (OpenShift 4.18) — runnable with an operator-supplied image. +# +# Part of the RHCOS evidence matrix (see docs/evidence-rhcos.md). Same Ignition +# boot path as Fedora CoreOS (internal/vm/ignition.go); off by default, enable +# with BPFCOMPAT_ENABLE_RHCOS=1 once the image is staged with +# make rhcos-image RHCOS_IMAGE_URL=.../rhcos-4.18...-qemu.x86_64.qcow2.gz \ +# RHCOS_IMAGE=/path (stage target: vm/cache/rhcos-4.18.qcow2) +# RHCOS for OpenShift 4.18 is a RHEL 9.x kernel (5.14, heavily backported); the +# real booted kernel is captured at runtime in the report. +id: rhcos-4.18-5.14 +distro: rhcos +version: "4.18" +kernel_family: "5.14" +arch: x86_64 +image: + local_path: "vm/cache/rhcos-4.18.qcow2" +boot: + memory_mb: 2048 + cpus: 2 +validator: + path: "/usr/local/bin/bpfcompat-validator" +capabilities: + expected_btf: true