Skip to content

fix(sensor): capture execve parent PID in-kernel so the fileless-systemd gate works#1123

Merged
maiconburn merged 1 commit into
mainfrom
fix/azure-fp-5-ebpf-execve-ppid
Jun 27, 2026
Merged

fix(sensor): capture execve parent PID in-kernel so the fileless-systemd gate works#1123
maiconburn merged 1 commit into
mainfrom
fix/azure-fp-5-ebpf-execve-ppid

Conversation

@maiconburn

Copy link
Copy Markdown
Collaborator

Problem (0.15.28 post-deploy re-audit)

fileless:systemd was still firing on Azure after 0.15.28. Root cause: the eBPF execve handler hardcoded event.ppid = 0 (sensor-ebpf/src/main.rs). Every prod execve ppid came from a userspace /proc/<pid>/status fallback, which works for long-lived processes (connect events carry ppid) but misses short-lived ones — notably systemd's sealed-executor fexecve of /proc/self/fd/N, whose /proc entry is gone before the ring reader looks. Measured in the audit: execve events were ppid=0 4995/5000. So the spec-PR1 (#1119) fileless-systemd parent-lineage gate, which needs the parent exe, was effectively inert in prod.

Fix — read the parent in-kernel

Read task_struct->real_parent->tgid at execve, mirroring the Execution Gate's BPRM_OFFSETS pattern (this crate has no vmlinux/CO-RE; it reads kernel struct fields at BTF-resolved offsets passed via a map):

  • new TASK_OFFSETS map (key 0 = real_parent, key 1 = tgid), populated by the userspace loader from kernel BTF (btf_offsets::member_offset) — correct across kernels/arches.
  • the execve handler reads the offsets + does two bpf_probe_read_kernel hops (task → real_parent → tgid). Offsets absent / 0 (no BTF) → returns 0 and leaves the /proc fallback in place (unchanged behaviour) — never a guessed offset.

Validated live on test001 (x86_64, kernel 6.x)

  • eBPF compiles; verifier accepts the program (eBPF collector active - 27 hooks).
  • TASK_OFFSETS resolves from BTF (real_parent=2504 tgid=2492).
  • comm=systemd cmd=/proc/self/fd/9 now reports ppid=1 → the fileless gate resolves /proc/1/exe=/usr/lib/systemd/systemd and suppresses it (the exact target).
  • Injected bash → id/true reports the shell's pid.
  • Installed sensor stopped/restored cleanly during the test.

aarch64 task_struct offsets differ but are BTF-resolved identically; CI builds both arches and the verifier-load is checked on deploy (Oracle).

Test

btf_offsets::member_offset_resolves_task_struct_fields pins the exact struct + member names (a rename would silently re-inert the gate). cargo fmt + cargo clippy --workspace -- -D warnings clean; eBPF builds clean on test001. 0 em-dashes.

PR 5/5 of the Azure cloud-platform-FP epic — the eBPF data-gap that made PR1 (#1119) effective in prod. Operator chose the eBPF approach (vs a partial userspace /proc fallback) for a complete fix.

🤖 Generated with Claude Code

…emd gate works

The 0.15.28 post-deploy re-audit found fileless:systemd still firing on
Azure. Root cause: the eBPF execve handler hardcoded `event.ppid = 0`;
every prod execve ppid came from a userspace /proc/<pid>/status fallback,
which works for long-lived processes (connect events) but MISSES the
short-lived ones — notably systemd's sealed-executor `fexecve` of
/proc/self/fd/N, whose /proc entry is gone before the ring reader reads
it. So the spec-PR1 fileless-systemd parent-lineage gate (which needs the
parent exe) was effectively inert in prod: execve events carried ppid=0
4995/5000 in the audit.

Fix: read `task_struct->real_parent->tgid` in-kernel at execve, mirroring
the Execution Gate's BPRM_OFFSETS pattern (this crate has no vmlinux/CO-RE;
it reads kernel struct fields at BTF-resolved offsets passed via a map):
- new TASK_OFFSETS map (key 0 = real_parent, key 1 = tgid), populated by
  the userspace loader from kernel BTF (member_offset), with per-host
  resolution so it is correct across kernels/arches.
- the execve handler reads the offsets + does two bpf_probe_read_kernel
  hops (task -> real_parent -> tgid). Offsets absent / 0 (no BTF) => it
  returns 0 and leaves the userspace /proc fallback in place (unchanged
  behaviour) — it never reads a guessed offset.

Validated live on test001 (x86_64, kernel 6.x): eBPF verifier accepts the
program (27 hooks active), TASK_OFFSETS resolves from BTF
(real_parent=2504 tgid=2492), and `comm=systemd cmd=/proc/self/fd/9` now
reports ppid=1 -> the fileless gate resolves /proc/1/exe=systemd and
suppresses it; injected `bash -> id` reports the shell's pid. aarch64
offsets differ but are BTF-resolved identically; CI builds both arches and
the verifier-load is checked on deploy.

Anchor: btf_offsets::member_offset_resolves_task_struct_fields pins the
exact struct + member names (a rename would silently re-inert the gate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maiconburn maiconburn requested a review from esteves-uk as a code owner June 27, 2026 06:26
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@maiconburn maiconburn merged commit f32bf6d into main Jun 27, 2026
22 checks passed
@maiconburn maiconburn deleted the fix/azure-fp-5-ebpf-execve-ppid branch June 27, 2026 07:16
maiconburn added a commit that referenced this pull request Jun 27, 2026
Carries #1123: the eBPF execve handler now reads task_struct->real_parent->tgid
in-kernel (BTF-resolved TASK_OFFSETS map) so the fileless-systemd
parent-lineage gate (0.15.28) actually engages in prod. The 0.15.28
re-audit found execve events carried ppid=0 in-kernel (4995/5000); every
ppid came from a userspace /proc fallback that misses short-lived execs
(systemd sealed-executor fexecve). Validated live on test001:
comm=systemd /proc/self/fd/9 -> ppid=1 -> gate suppresses.

Version bump (Cargo.toml + workspace lock + agents-install token) +
CHANGELOG [0.15.29]. No code change beyond #1123 already on main.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants