[2/2] Add CGROUP_SOCK_ADDR Logging Infrastructure#491
[2/2] Add CGROUP_SOCK_ADDR Logging Infrastructure#491yaakov-stein wants to merge 13 commits intofacebook:mainfrom
CGROUP_SOCK_ADDR Logging Infrastructure#491Conversation
Claude review of PR #491 (6cd0236)Suggestions
Nits
|
CGROUP_SOCK_ADDR Logging] Enable LoggingCGROUP_SOCK_ADDR Logging] Add CGROUP_SOCK_ADDR Logging Infrastructure
| } | ||
|
|
||
| if (log_opts & (1 << BF_LOG_OPT_INTERNET)) { | ||
| __builtin_memcpy(log->payload.sock_addr.saddr, ctx->sock_addr.saddr, |
There was a problem hiding this comment.
Claude: must-fix: For IPv4 hooks the prologue writes only 4 bytes into ctx->sock_addr.saddr and ctx->sock_addr.daddr, but here sizeof(ctx->sock_addr.saddr) is 16. The __builtin_memcpy unconditionally copies all 16 bytes, so for IPv4 the remaining 12 bytes per address are uninitialized BPF stack data copied into the ring buffer entry.
For CONNECT4/CONNECT6 hooks, has_saddr is false so ctx->sock_addr.saddr is never written at all — the entire 16 bytes are uninitialized. This is either a BPF verifier rejection (if the verifier tracks the uninitialized read) or uninitialized stack data submitted to userspace via bpf_ringbuf_submit.
Consider either: (a) zero-initializing the sock_addr region in the prologue before writing the partial address, or (b) passing the address size to the stub so it copies only the correct number of bytes.
There was a problem hiding this comment.
Currently, the packet logging does the same thing for headers that are not requested. This is not a verifier rejection in either case, but we do submit uninitialized stack data in both cases. As long as this doesn't become a verifier issue, I think it can be left as is given that both option add some amount of BPF instructions. Happy to make a change though if others think it makes sense.
CGROUP_SOCK_ADDR Logging] Add CGROUP_SOCK_ADDR Logging InfrastructureCGROUP_SOCK_ADDR Logging Infrastructure
Rename the packet logging ELF stub from `log` to `pkt_log` to establish naming symmetry with the upcoming `sock_log` stub for socket-based hooks.
80e428a to
0827510
Compare
| #define static_assert _Static_assert | ||
| #endif | ||
|
|
||
| #include <linux/in6.h> |
There was a problem hiding this comment.
Claude: nit: This is the first <linux/...> include in any public header under include/bpfilter/. It is only needed for sizeof(struct in6_addr) in _bf_log_sock_addr.saddr and .daddr. The sibling struct _bf_runtime_sock_addr in cgen/runtime.h already uses the literal 16 for the same purpose. Using 16 here (with a comment referencing sizeof(struct in6_addr)) would avoid introducing a kernel UAPI header dependency into the public API surface.
0827510 to
3ca48eb
Compare
| /* Copy context fields into the runtime for the logging stub. | ||
| * The verifier restricts context access by attach type, so the shared | ||
| * stub reads from here instead of the context directly. */ | ||
| if (program->runtime.chain->flags & BF_FLAG(BF_CHAIN_LOG)) { |
There was a problem hiding this comment.
Claude: suggestion: The context field copies (dport, saddr, daddr) are guarded by BF_CHAIN_LOG, which is set if any rule in the chain has logging enabled. This means every connection through the chain pays the cost of the BPF loads and stores, even if the matched rule does not have logging or only requests pid,comm.
For chains where logging is rare (e.g., one log rule among many), consider computing the union of all rule->log options at chain creation time and only emitting copies for actually-needed fields. Not a correctness issue, but the number of instructions per invocation could be reduced.
Restructure bf_log as a tagged union with packet and socket address variants, discriminated by bf_log_type.
3ca48eb to
e693f90
Compare
Move packet.c/h from cgen/matcher/ to cgen/ and rename bf_matcher_generate_packet() to bf_packet_gen_inline_matcher(). This file provides shared codegen utilities for all packet-based flavors, not a matcher implementation, and will host additional shared packet codegen like bf_packet_gen_inline_log() in subsequent commits.
Add struct _bf_runtime_sock_addr to bf_runtime with saddr, daddr, and dport fields. The cgroup_sock_addr inline codegen will pre-read context fields into these before calling the sock_addr_log ELF stub, avoiding per-hook-type verifier restrictions on bpf_sock_addr field access.
Add a BPF ELF stub for cgroup_sock_addr logging. Captures process metadata and socket address fields pre-read into bf_runtime by the prologue.
Add BF_LOG_OPT_PID and BF_LOG_OPT_COMM to the bf_log_opt enum for socket-based hooks that log process metadata instead of packet headers.
Reject invalid log options during chain creation: packet hooks only support link/internet/transport, socket hooks only support internet/transport/pid/comm.
Update log action description with the new l2/l3/l4/pid/comm options and document socket hook log output in the chain logs section.
e693f90 to
6cd0236
Compare
qdeslandes
left a comment
There was a problem hiding this comment.
Few comments. Also, can you add a benchmark for logging from sock_addr?
| - ``log``: optional. If set, log the requested data. At least one option is required. Available options depend on the hook type: | ||
|
|
||
| - Packet hooks (XDP, TC, NF, cgroup_skb): ``link`` (layer 2 header), ``internet`` (layer 3 header), ``transport`` (layer 4 header). | ||
| - Socket hooks (``BF_HOOK_CGROUP_SOCK_ADDR_*``): ``internet`` (source and destination addresses), ``transport`` (destination port), ``pid`` (process ID), ``comm`` (process name). |
There was a problem hiding this comment.
I don't think we need to discriminate logged data for socket hooks (c.f. the 1st PR in the stack).
| bfcli will print log entries as they are published by the chain. Only the data requested in the ``log`` action will be printed. Hit ``Ctrl+C`` to quit. | ||
|
|
||
| For each logged packet, bfcli will print the receive timestamp and the packet size, followed by each requested layer (see the ``log`` action below). If one of the requested layer could not be processed by the chain, the corresponding output will be truncated. | ||
| For packet-based hooks, each log entry contains the receive timestamp and the packet size, followed by each requested layer (see the ``log`` action below). If a requested layer could not be processed by the chain, the corresponding output will be truncated. |
There was a problem hiding this comment.
Both these paragraph skip over shared information in bf_log, e.g. the rule's verdict. Please document the logged info fully: shared data and hook-specific data.
| /** | ||
| * @brief Pre-read socket address fields for `cgroup_sock_addr` logging. | ||
| * | ||
| * The BPF verifier restricts which `bpf_sock_addr` context fields a program | ||
| * can access based on its attach type (e.g. `msg_src_ip4` is only valid for | ||
| * `UDP4_SENDMSG`, `user_ip4` only for IPv4 hooks). Because the `sock_addr_log` | ||
| * ELF stub is shared across all hook types, it cannot read these fields | ||
| * directly. | ||
| * | ||
| * Instead, the per-hook inline codegen pre-reads the permitted fields into | ||
| * this struct before calling the stub. The stub then copies from here into | ||
| * the log entry, avoiding any restricted context access. | ||
| */ | ||
| struct bf_runtime_sock_addr | ||
| { | ||
| /** Source address (IPv4: first 4 bytes, IPv6: all 16). */ | ||
| __u8 bf_aligned(8) saddr[16]; | ||
|
|
||
| /** Destination address (IPv4: first 4 bytes, IPv6: all 16). */ | ||
| __u8 bf_aligned(8) daddr[16]; | ||
|
|
||
| /** Destination port in host byte order. */ | ||
| __u16 bf_aligned(8) dport; | ||
| }; |
There was a problem hiding this comment.
While I understand the issue here, this forces pre-fetching data that we might not use. Especially when it's done in the prologue, we'll pay the cost of it for rules that might not even be hit. Having a dedicated elfstub for each sock_addr hook would be better than prefetching this data.
Stack:
CGROUP_SOCK_ADDRLogging Infrastructure #491Summary
This adds the remaining infrastructure to go from the hard-coded packet-specific logging to a flavor-specific logging approach that makes it easy to enable logging for
CGROUP_SOCK_ADDR:We add a new
sock_addr_logELF stub, agen_inline_logcallback (see #486), CLI formatting for socket log entries, and per-hook log option validation (e.g.BF_LOG_OPT_LINKis rejected on socket hooks,BF_LOG_OPT_PIDis rejected on packet hooks). It also introduces two new log options,BF_LOG_OPT_PIDandBF_LOG_OPT_COMM, which use thebpf_get_current_pid_tgid()andbpf_get_current_comm()helper calls in the stub.One of the challenges here is that the BPF verifier restricts which
bpf_sock_addrcontext fields a program can access based on its attach type. For example,msg_src_ip4is only valid for SENDMSG4 anduser_ip4is only valid for IPv4 hooks. The verifier checks all code paths statically regardless of runtime reachability, so a shared ELF stub compiled once and linked into all hook types cannot read these fields directly. To solve this, the per-hook inline codegen in the prologue copies the permitted fields into a new_bf_runtime_sock_addrstruct on the BPF stack, and the stub copies from there into the log entry. The copying is done once per program invocation, guarded by the existingBF_CHAIN_LOGflag, so programs without logging rules pay zero cost.Adding
_bf_runtime_sock_addrto bf_runtime increases the stack frame for all flavors by 40 bytes (with alignment), but the struct remains well within the BPF 512-byte stack limit.Testing
sendmsg4log:Note
connect4readingmsg_src_ip6), so we'll need to add it to the stack frame, similar to how we do with packets. This leads to a new question - we really should have differentbf_runtimefor packet flavors andCGROUP_SOCK_ADDRgiven how different they are and the different data they use (can be done in a similar type of way we did it inbf_log, sharing common fields and then using a union). However, I don't think know is the right time to make this invasive change when a less invasive approach works, albeit in a less clean/proper way. Imo we can add this as a follow-up.