Skip to content

feat(vm): close RHCOS coverage gaps — aarch64 boot + broader artifact matrix#54

Merged
ErenAri merged 2 commits into
mainfrom
feat/rhcos-coverage-aarch64
Jun 27, 2026
Merged

feat(vm): close RHCOS coverage gaps — aarch64 boot + broader artifact matrix#54
ErenAri merged 2 commits into
mainfrom
feat/rhcos-coverage-aarch64

Conversation

@ErenAri

@ErenAri ErenAri commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Closes the two gaps called out on the RHCOS evidence matrix: aarch64 and a broader artifact set. Both done with real boots, plus two genuine executor fixes uncovered along the way.

Executor fixes (latent bugs for any aarch64 cloud-image profile)

  1. aarch64 UEFI firmware. aarch64 virt has no built-in firmware (x86 has SeaBIOS), so the executor now supplies AAVMF as pflash — read-only CODE + a per-VM writable VARS copy — overridable via BPFCOMPAT_AARCH64_UEFI_CODE/_VARS. Without this an aarch64 guest never boots.
  2. KVM only for same-arch guests. /dev/kvm on an x86_64 host can't run an aarch64 guest, so qemuMachineArgs now falls back to TCG when guest arch ≠ host arch instead of passing an invalid accel=kvm.

Broader artifacts — x86_64, 6 × 3, real boots

Artifact exercises 4.14 (9.2) 4.16 (9.4) 4.18 (9.4)
simple-pass baseline
ringbuf-modern tp + ring buffer (≥5.8) ✅ load+attach
perfbuf-fallback tp + perf buffer ✅ load+attach
attach-warn kprobe (missing symbol) ✅ load / attach warn
aegis BPF-LSM (4 hooks) + tp ❌ rejected (CAPABILITY_FAILURE) load+attach 4/4 4/4
core-relocation-fail CO-RE negative ❌ rejected ❌ rejected ❌ rejected

The LSM row is the headline: a real backport boundary — BPF-LSM is active in RHEL 9.4 but not 9.2, on the same 5.14 line. Version inference can't see that; a real boot does.

aarch64 — real cross-arch boot

RHCOS 4.16 aarch64 booted on this x86_64 host under TCG: kernel 5.14.0-427.50.1.el9_4.aarch64, ring-buffer load+attach 1/1 — exercising the cross-compiled aarch64 validator, EDK II (AAVMF) UEFI, Ignition over -fw_cfg, SSH as core. On a native ARM64 KVM host the same run uses hardware acceleration automatically.

Honest scoping (unchanged)

  • aarch64 here ran under TCG (slow, but a genuine aarch64 kernel + real bpf()); native ARM64 KVM is the fast path.
  • RHCOS stays opt-in (BPFCOMPAT_ENABLE_RHCOS) and out of the README "Distributions covered" table.
  • Not in public CI (operator-supplied images); recorded reproducible run.

Tests / verify

  • New firmware unit tests (pflash args + writable-vars staging).
  • go build/vet, gofmt, env-docs-check, go test ./internal/... green.

Adds profile rhcos-4.16-arm64, matrices/rhcos-arm64.yaml, env vars in internal/envref + docs/env-reference.md; full data in docs/evidence-rhcos.md.

🤖 Generated with Claude Code

ErenAri and others added 2 commits June 27, 2026 15:29
… matrix

Two real fixes unblock aarch64 VM validation (latent bugs for any aarch64
cloud-image profile, not just RHCOS):
- aarch64 UEFI firmware: aarch64 `virt` has no built-in firmware, so the
  executor now supplies AAVMF as pflash (read-only CODE + a per-VM writable VARS
  copy), overridable via BPFCOMPAT_AARCH64_UEFI_CODE/_VARS. Without it an
  aarch64 guest never boots.
- KVM only accelerates a same-arch guest: /dev/kvm on an x86_64 host cannot run
  an aarch64 guest, so qemuMachineArgs now falls back to TCG when guest arch !=
  host arch instead of passing an invalid accel=kvm.

Evidence (docs/evidence-rhcos.md) expanded to enterprise scope, all real boots:
- x86_64: 6 artifacts × OpenShift 4.14/4.16/4.18. Adds perf-buffer, kprobe, and
  a BPF-LSM artifact (aegis). The LSM case is a real backport boundary — rejected
  on 4.14 (RHEL 9.2, EPERM/CAPABILITY_FAILURE) but load+attach all 4 hooks on
  4.16/4.18 (RHEL 9.4). core-relocation-fail rejected everywhere (discriminator).
- aarch64: real RHCOS 4.16 boot (5.14.0-427.50.1.el9_4.aarch64) under TCG,
  ring-buffer load+attach 1/1 — exercising the cross-compiled aarch64 validator,
  EDK II UEFI, Ignition, and SSH.

Adds profile rhcos-4.16-arm64 and matrices/rhcos-arm64.yaml; env vars cataloged
in envref + env-reference.md. RHCOS stays opt-in (BPFCOMPAT_ENABLE_RHCOS) and out
of the README "Distributions covered" table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ErenAri ErenAri merged commit 1b932f4 into main Jun 27, 2026
7 of 8 checks passed
@ErenAri ErenAri deleted the feat/rhcos-coverage-aarch64 branch June 27, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant