fix(sensor): capture execve parent PID in-kernel so the fileless-systemd gate works#1123
Merged
Merged
Conversation
…emd gate works The 0.15.28 post-deploy re-audit found fileless:systemd still firing on Azure. Root cause: the eBPF execve handler hardcoded `event.ppid = 0`; every prod execve ppid came from a userspace /proc/<pid>/status fallback, which works for long-lived processes (connect events) but MISSES the short-lived ones — notably systemd's sealed-executor `fexecve` of /proc/self/fd/N, whose /proc entry is gone before the ring reader reads it. So the spec-PR1 fileless-systemd parent-lineage gate (which needs the parent exe) was effectively inert in prod: execve events carried ppid=0 4995/5000 in the audit. Fix: read `task_struct->real_parent->tgid` in-kernel at execve, mirroring the Execution Gate's BPRM_OFFSETS pattern (this crate has no vmlinux/CO-RE; it reads kernel struct fields at BTF-resolved offsets passed via a map): - new TASK_OFFSETS map (key 0 = real_parent, key 1 = tgid), populated by the userspace loader from kernel BTF (member_offset), with per-host resolution so it is correct across kernels/arches. - the execve handler reads the offsets + does two bpf_probe_read_kernel hops (task -> real_parent -> tgid). Offsets absent / 0 (no BTF) => it returns 0 and leaves the userspace /proc fallback in place (unchanged behaviour) — it never reads a guessed offset. Validated live on test001 (x86_64, kernel 6.x): eBPF verifier accepts the program (27 hooks active), TASK_OFFSETS resolves from BTF (real_parent=2504 tgid=2492), and `comm=systemd cmd=/proc/self/fd/9` now reports ppid=1 -> the fileless gate resolves /proc/1/exe=systemd and suppresses it; injected `bash -> id` reports the shell's pid. aarch64 offsets differ but are BTF-resolved identically; CI builds both arches and the verifier-load is checked on deploy. Anchor: btf_offsets::member_offset_resolves_task_struct_fields pins the exact struct + member names (a rename would silently re-inert the gate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
esteves-uk
approved these changes
Jun 27, 2026
maiconburn
added a commit
that referenced
this pull request
Jun 27, 2026
Carries #1123: the eBPF execve handler now reads task_struct->real_parent->tgid in-kernel (BTF-resolved TASK_OFFSETS map) so the fileless-systemd parent-lineage gate (0.15.28) actually engages in prod. The 0.15.28 re-audit found execve events carried ppid=0 in-kernel (4995/5000); every ppid came from a userspace /proc fallback that misses short-lived execs (systemd sealed-executor fexecve). Validated live on test001: comm=systemd /proc/self/fd/9 -> ppid=1 -> gate suppresses. Version bump (Cargo.toml + workspace lock + agents-install token) + CHANGELOG [0.15.29]. No code change beyond #1123 already on main. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (0.15.28 post-deploy re-audit)
fileless:systemdwas still firing on Azure after 0.15.28. Root cause: the eBPF execve handler hardcodedevent.ppid = 0(sensor-ebpf/src/main.rs). Every prod execveppidcame from a userspace/proc/<pid>/statusfallback, which works for long-lived processes (connect events carry ppid) but misses short-lived ones — notably systemd's sealed-executorfexecveof/proc/self/fd/N, whose/procentry is gone before the ring reader looks. Measured in the audit: execve events wereppid=04995/5000. So the spec-PR1 (#1119) fileless-systemd parent-lineage gate, which needs the parent exe, was effectively inert in prod.Fix — read the parent in-kernel
Read
task_struct->real_parent->tgidat execve, mirroring the Execution Gate'sBPRM_OFFSETSpattern (this crate has no vmlinux/CO-RE; it reads kernel struct fields at BTF-resolved offsets passed via a map):TASK_OFFSETSmap (key 0 =real_parent, key 1 =tgid), populated by the userspace loader from kernel BTF (btf_offsets::member_offset) — correct across kernels/arches.bpf_probe_read_kernelhops (task → real_parent → tgid). Offsets absent / 0 (no BTF) → returns 0 and leaves the/procfallback in place (unchanged behaviour) — never a guessed offset.Validated live on test001 (x86_64, kernel 6.x)
eBPF collector active - 27 hooks).TASK_OFFSETSresolves from BTF (real_parent=2504 tgid=2492).comm=systemd cmd=/proc/self/fd/9now reportsppid=1→ the fileless gate resolves/proc/1/exe=/usr/lib/systemd/systemdand suppresses it (the exact target).bash → id/truereports the shell's pid.aarch64 task_struct offsets differ but are BTF-resolved identically; CI builds both arches and the verifier-load is checked on deploy (Oracle).
Test
btf_offsets::member_offset_resolves_task_struct_fieldspins the exact struct + member names (a rename would silently re-inert the gate).cargo fmt+cargo clippy --workspace -- -D warningsclean; eBPF builds clean on test001. 0 em-dashes.PR 5/5 of the Azure cloud-platform-FP epic — the eBPF data-gap that made PR1 (#1119) effective in prod. Operator chose the eBPF approach (vs a partial userspace
/procfallback) for a complete fix.🤖 Generated with Claude Code