Skip to content

[2/2] Add CGROUP_SOCK_ADDR Logging Infrastructure#491

Open
yaakov-stein wants to merge 13 commits intofacebook:mainfrom
yaakov-stein:add_cgroup_sock_addr_logging
Open

[2/2] Add CGROUP_SOCK_ADDR Logging Infrastructure#491
yaakov-stein wants to merge 13 commits intofacebook:mainfrom
yaakov-stein:add_cgroup_sock_addr_logging

Conversation

@yaakov-stein
Copy link
Copy Markdown
Contributor

@yaakov-stein yaakov-stein commented Mar 29, 2026

Stack:

Summary

This adds the remaining infrastructure to go from the hard-coded packet-specific logging to a flavor-specific logging approach that makes it easy to enable logging for CGROUP_SOCK_ADDR:

BEFORE                                  AFTER

program.c (hardcoded log emission)      program.c (callback dispatch)
──────────────────────────────────      ────────────────────────────────  
if rule->log:                           if rule->log:
  setup R1-R5 for packet headers          ops->gen_inline_log(program, rule)
  EMIT_FIXUP_ELFSTUB(PKT_LOG)

chain.c:                                Packet flavors (XDP, TC, NF, cgroup_skb)
  cgroup_sock_addr? → -ENOTSUP          ────────────────────────────────  
                                        bf_packet_gen_inline_log():
                                          R1=ctx, R2=rule_id, R3=log_opts,
                                          R4=verdict, R5=packed_protos
                                          → ELFSTUB_PKT_LOG

                                        cgroup_sock_addr.c
                                        ──────────────────────────────── 
                                        gen_inline_log():
                                          R1=ctx, R2=rule_id, R3=verdict,
                                          R4=packed_protos
                                          → ELFSTUB_SOCK_ADDR_LOG

We add a new sock_addr_log ELF stub, a gen_inline_log callback (see #486), CLI formatting for socket log entries, and per-hook log option validation (e.g. BF_LOG_OPT_LINK is rejected on socket hooks, BF_LOG_OPT_PID is rejected on packet hooks). It also introduces two new log options, BF_LOG_OPT_PID and BF_LOG_OPT_COMM, which use the bpf_get_current_pid_tgid() and bpf_get_current_comm() helper calls in the stub.

One of the challenges here is that the BPF verifier restricts which bpf_sock_addr context fields a program can access based on its attach type. For example, msg_src_ip4 is only valid for SENDMSG4 and user_ip4 is only valid for IPv4 hooks. The verifier checks all code paths statically regardless of runtime reachability, so a shared ELF stub compiled once and linked into all hook types cannot read these fields directly. To solve this, the per-hook inline codegen in the prologue copies the permitted fields into a new _bf_runtime_sock_addr struct on the BPF stack, and the stub copies from there into the log entry. The copying is done once per program invocation, guarded by the existing BF_CHAIN_LOG flag, so programs without logging rules pay zero cost.

Adding _bf_runtime_sock_addr to bf_runtime increases the stack frame for all flavors by 40 bytes (with alignment), but the struct remains well within the BPF 512-byte stack limit.

Testing

  • Manual testing, example sendmsg4 log:
Screenshot 2026-03-29 at 3 10 01 PM

Note

  • As mentioned above, the verifier doesn't allow accessing certain paths (ex. connect4 reading msg_src_ip6), so we'll need to add it to the stack frame, similar to how we do with packets. This leads to a new question - we really should have different bf_runtime for packet flavors and CGROUP_SOCK_ADDR given how different they are and the different data they use (can be done in a similar type of way we did it in bf_log, sharing common fields and then using a union). However, I don't think know is the right time to make this invasive change when a less invasive approach works, albeit in a less clean/proper way. Imo we can add this as a follow-up.

@meta-cla meta-cla bot added the cla signed label Mar 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 29, 2026

Claude review of PR #491 (6cd0236)

Suggestions

  • Missing e2e tests for other sock_addr hooks with loggingtests/e2e/hooks/cgroup_sock_addr_connect4.sh — only connect4 is tested with logging; connect6, sendmsg4, and sendmsg6 hook-specific codegen paths (IPv6 addresses, source address presence) are not exercised
  • Add static_assert for bf_runtime size vs BPF stack limitsrc/libbpfilter/cgen/runtime.h:153 — the new sock_addr field grows bf_runtime for all flavors; a compile-time guard against the 512-byte BPF stack limit would catch future regressions

Nits

  • System include orderingsrc/libbpfilter/include/bpfilter/runtime.h:26 — blank line between <linux/in6.h> and <asm/types.h> splits one system-header group into two
  • "receive timestamp" wordingdoc/usage/bfcli.rst:138 — for CGROUP_SOCK_ADDR hooks, "event timestamp" would be more accurate than "receive timestamp"

Workflow run

@yaakov-stein yaakov-stein changed the title [2/2][CGROUP_SOCK_ADDR Logging] Enable Logging [2/2][CGROUP_SOCK_ADDR Logging] Add CGROUP_SOCK_ADDR Logging Infrastructure Mar 29, 2026
}

if (log_opts & (1 << BF_LOG_OPT_INTERNET)) {
__builtin_memcpy(log->payload.sock_addr.saddr, ctx->sock_addr.saddr,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: For IPv4 hooks the prologue writes only 4 bytes into ctx->sock_addr.saddr and ctx->sock_addr.daddr, but here sizeof(ctx->sock_addr.saddr) is 16. The __builtin_memcpy unconditionally copies all 16 bytes, so for IPv4 the remaining 12 bytes per address are uninitialized BPF stack data copied into the ring buffer entry.

For CONNECT4/CONNECT6 hooks, has_saddr is false so ctx->sock_addr.saddr is never written at all — the entire 16 bytes are uninitialized. This is either a BPF verifier rejection (if the verifier tracks the uninitialized read) or uninitialized stack data submitted to userspace via bpf_ringbuf_submit.

Consider either: (a) zero-initializing the sock_addr region in the prologue before writing the partial address, or (b) passing the address size to the stub so it copies only the correct number of bytes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the packet logging does the same thing for headers that are not requested. This is not a verifier rejection in either case, but we do submit uninitialized stack data in both cases. As long as this doesn't become a verifier issue, I think it can be left as is given that both option add some amount of BPF instructions. Happy to make a change though if others think it makes sense.

@yaakov-stein yaakov-stein changed the title [2/2][CGROUP_SOCK_ADDR Logging] Add CGROUP_SOCK_ADDR Logging Infrastructure [2/2] Add CGROUP_SOCK_ADDR Logging Infrastructure Mar 29, 2026
Rename the packet logging ELF stub from `log` to `pkt_log` to
establish naming symmetry with the upcoming `sock_log` stub for
socket-based hooks.
@yaakov-stein yaakov-stein force-pushed the add_cgroup_sock_addr_logging branch 4 times, most recently from 80e428a to 0827510 Compare March 29, 2026 22:31
@yaakov-stein yaakov-stein marked this pull request as ready for review March 29, 2026 22:42
#define static_assert _Static_assert
#endif

#include <linux/in6.h>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: This is the first <linux/...> include in any public header under include/bpfilter/. It is only needed for sizeof(struct in6_addr) in _bf_log_sock_addr.saddr and .daddr. The sibling struct _bf_runtime_sock_addr in cgen/runtime.h already uses the literal 16 for the same purpose. Using 16 here (with a comment referencing sizeof(struct in6_addr)) would avoid introducing a kernel UAPI header dependency into the public API surface.

@yaakov-stein yaakov-stein force-pushed the add_cgroup_sock_addr_logging branch from 0827510 to 3ca48eb Compare March 30, 2026 01:29
/* Copy context fields into the runtime for the logging stub.
* The verifier restricts context access by attach type, so the shared
* stub reads from here instead of the context directly. */
if (program->runtime.chain->flags & BF_FLAG(BF_CHAIN_LOG)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The context field copies (dport, saddr, daddr) are guarded by BF_CHAIN_LOG, which is set if any rule in the chain has logging enabled. This means every connection through the chain pays the cost of the BPF loads and stores, even if the matched rule does not have logging or only requests pid,comm.

For chains where logging is rare (e.g., one log rule among many), consider computing the union of all rule->log options at chain creation time and only emitting copies for actually-needed fields. Not a correctness issue, but the number of instructions per invocation could be reduced.

Restructure bf_log as a tagged union with packet and socket address
variants, discriminated by bf_log_type.
@yaakov-stein yaakov-stein force-pushed the add_cgroup_sock_addr_logging branch from 3ca48eb to e693f90 Compare March 30, 2026 20:05
Move packet.c/h from cgen/matcher/ to cgen/ and rename
bf_matcher_generate_packet() to bf_packet_gen_inline_matcher(). This
file provides shared codegen utilities for all packet-based flavors,
not a matcher implementation, and will host additional shared packet
codegen like bf_packet_gen_inline_log() in subsequent commits.
Add struct _bf_runtime_sock_addr to bf_runtime with saddr, daddr, and
dport fields. The cgroup_sock_addr inline codegen will pre-read context
fields into these before calling the sock_addr_log ELF stub, avoiding
per-hook-type verifier restrictions on bpf_sock_addr field access.
Add a BPF ELF stub for cgroup_sock_addr logging. Captures process
metadata and socket address fields pre-read into bf_runtime by the
prologue.
Add BF_LOG_OPT_PID and BF_LOG_OPT_COMM to the bf_log_opt enum for
socket-based hooks that log process metadata instead of packet headers.
Reject invalid log options during chain creation: packet hooks only
support link/internet/transport, socket hooks only support
internet/transport/pid/comm.
Update log action description with the new l2/l3/l4/pid/comm options
and document socket hook log output in the chain logs section.
@yaakov-stein yaakov-stein force-pushed the add_cgroup_sock_addr_logging branch from e693f90 to 6cd0236 Compare March 30, 2026 20:38
Copy link
Copy Markdown
Contributor

@qdeslandes qdeslandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments. Also, can you add a benchmark for logging from sock_addr?

- ``log``: optional. If set, log the requested data. At least one option is required. Available options depend on the hook type:

- Packet hooks (XDP, TC, NF, cgroup_skb): ``link`` (layer 2 header), ``internet`` (layer 3 header), ``transport`` (layer 4 header).
- Socket hooks (``BF_HOOK_CGROUP_SOCK_ADDR_*``): ``internet`` (source and destination addresses), ``transport`` (destination port), ``pid`` (process ID), ``comm`` (process name).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to discriminate logged data for socket hooks (c.f. the 1st PR in the stack).

bfcli will print log entries as they are published by the chain. Only the data requested in the ``log`` action will be printed. Hit ``Ctrl+C`` to quit.

For each logged packet, bfcli will print the receive timestamp and the packet size, followed by each requested layer (see the ``log`` action below). If one of the requested layer could not be processed by the chain, the corresponding output will be truncated.
For packet-based hooks, each log entry contains the receive timestamp and the packet size, followed by each requested layer (see the ``log`` action below). If a requested layer could not be processed by the chain, the corresponding output will be truncated.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both these paragraph skip over shared information in bf_log, e.g. the rule's verdict. Please document the logged info fully: shared data and hook-specific data.

Comment on lines +45 to +68
/**
* @brief Pre-read socket address fields for `cgroup_sock_addr` logging.
*
* The BPF verifier restricts which `bpf_sock_addr` context fields a program
* can access based on its attach type (e.g. `msg_src_ip4` is only valid for
* `UDP4_SENDMSG`, `user_ip4` only for IPv4 hooks). Because the `sock_addr_log`
* ELF stub is shared across all hook types, it cannot read these fields
* directly.
*
* Instead, the per-hook inline codegen pre-reads the permitted fields into
* this struct before calling the stub. The stub then copies from here into
* the log entry, avoiding any restricted context access.
*/
struct bf_runtime_sock_addr
{
/** Source address (IPv4: first 4 bytes, IPv6: all 16). */
__u8 bf_aligned(8) saddr[16];

/** Destination address (IPv4: first 4 bytes, IPv6: all 16). */
__u8 bf_aligned(8) daddr[16];

/** Destination port in host byte order. */
__u16 bf_aligned(8) dport;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the issue here, this forces pre-fetching data that we might not use. Especially when it's done in the prologue, we'll pay the cost of it for rules that might not even be hit. Having a dedicated elfstub for each sock_addr hook would be better than prefetching this data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants