Skip to content

Add CGROUP_SOCK_ADDR Set Support#493

Open
yaakov-stein wants to merge 8 commits intofacebook:mainfrom
yaakov-stein:add_csa_set_support
Open

Add CGROUP_SOCK_ADDR Set Support#493
yaakov-stein wants to merge 8 commits intofacebook:mainfrom
yaakov-stein:add_csa_set_support

Conversation

@yaakov-stein
Copy link
Copy Markdown
Contributor

@yaakov-stein yaakov-stein commented Mar 31, 2026

Summary

Before this change, sets (hash maps and LPM tries used for bulk matching) were only supported on packet-based hooks. This meant users who wanted to filter against a list of IPs or ports on a CONNECT/SENDMSG hook had to create one rule per value, which scales poorly.

This change adds set support for all four CGROUP_SOCK_ADDR hooks, reusing the same BPF map infrastructure (hash maps for exact match, LPM tries for prefix match) that packet hooks already use.

Refactoring: make set codegen composable/generic

Set codegen previously lived entirely in set.c and was tightly coupled to packet-based header loading. To reuse it for CGROUP_SOCK_ADDR hooks, we extracted the generic pieces (copy field to scratch, map/trie lookup) into shared helpers and moved the packet-specific logic into packet.c. Protocol checks and layer conflict detection were also hoisted out of set codegen into program.c's existing dedup loop, so they now cover set components the same way they cover individual matchers.

As a drive-by, set components are now validated individually against the hook's supported matchers.

CGROUP_SOCK_ADDR set codegen

With the shared helpers in place, CGROUP_SOCK_ADDR set codegen maps each key component to its bpf_sock_addr context field offset, copies the field to the scratch buffer, and calls the shared map lookup. No header parsing needed — r6 already points to the context.

Why can't we use the current set codegen?

Packet-based hooks receive a pointer to raw packet memory. The BPF program parses headers layer by layer (L2 -> L3 -> L4), and r6 points to the current header being inspected.

CGROUP_SOCK_ADDR hooks receive a bpf_sock_addr context struct instead. The kernel pre-extracts socket metadata into typed fields:

struct bpf_sock_addr {
    __u32 user_ip4;        // destination IPv4
    __u32 user_ip6[4];     // destination IPv6
    __u32 user_port;       // destination port (network order)
    __u32 msg_src_ip4;     // source IPv4  (sendmsg only)
    __u32 msg_src_ip6[4];  // source IPv6  (sendmsg only)
    ...
};

This difference has two consequences for set codegen:

  1. No header parsing needed. r6 already points to the context. Instead of loading a header pointer and extracting fields from parsed packet data, we read directly from context field offsets (r6 + offsetof(bpf_sock_addr, field)).

  2. BPF verifier constraints on access width. Packet memory can be read at any width. Context struct fields must be read at widths that respect field alignment. The verifier rejects misaligned reads from context pointers. The new bf_stub_load function handles this by picking the largest safe access width based on both source and destination alignment.

The diagram below shows the two paths:

Packet hooks (XDP/TC/NF/cgroup_skb):
  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │ Parse L2/L3  │────>│ Load header  │────>│ Copy field   │───> map lookup
  │ (dynptr)     │     │ ptr into r6  │     │ to scratch   │
  └──────────────┘     └──────────────┘     └──────────────┘

CGROUP_SOCK_ADDR:
  ┌──────────────────┐     ┌──────────────┐
  │ r6 = ctx (set    │────>│ Copy field   │───> map lookup
  │ once in prologue)│     │ to scratch   │
  └──────────────────┘     └──────────────┘

Test plan

  • Unit tests: validation of set component hook compatibility and cross-layer conflict detection
  • E2e tests: each CGROUP_SOCK_ADDR hook has dry-run parsing tests and behavioral tests covering hash sets, trie sets, and multi-component sets

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

Claude review of PR #493 (5ab9f35)

Previous must-fix items (NULL pointer dereferences in chain.c, program.c, cgroup_sock_addr.c) have been addressed in the latest push. The bf_stub_load refactoring also corrects a pre-existing alignment check issue in the copy loop (old code checked only the initial offset, not the advancing offset). Existing inline comments from prior runs cover the remaining open suggestions and nits.

Suggestions

  • Redundant header loads in packet set loopsrc/libbpfilter/cgen/matcher/packet.c:340bf_stub_load_header is called per-component in the hash-map loop even when multiple components share the same protocol layer, emitting redundant BPF instructions

Nits

  • Missing assert for array pointer parametersrc/libbpfilter/chain.c:72_bf_rule_check_layer takes layer_hdr_id (array/pointer) without asserting it, inconsistent with every other function in the file
  • Inconsistent behavioral test orderingtests/e2e/hooks/cgroup_sock_addr_connect6.sh:138 — Test blocks ordered differently (multi-component first) compared to connect4.sh (single-component first)

Workflow run

@yaakov-stein yaakov-stein marked this pull request as draft March 31, 2026 03:49
@yaakov-stein yaakov-stein removed the request for review from qdeslandes March 31, 2026 03:49
@yaakov-stein yaakov-stein force-pushed the add_csa_set_support branch 2 times, most recently from 8a5b67b to 1def2a9 Compare March 31, 2026 17:27
Extract the copy-to-scratch logic from bf_stub_stx_payload into a new
function with explicit parameters. bf_stub_load copies size bytes from
R6 + src_offset to scratch[dst_scratch_offset], using min(src, dst)
alignment for access size selection.

For each chunk, the largest access width is picked where both src_off
and dst_off are aligned and remaining >= width. This ensures optimal
width for packet access (R6 = memory pointer) and verifier-safe width
for context access (R6 = ctx pointer).

bf_stub_stx_payload is rewritten as a thin wrapper around bf_stub_load.
Extract reusable set codegen helpers from the existing packet-specific
set code:

- bf_set_generate_map_lookup: the 5-instruction map-lookup tail shared
  by both trie and hash paths (load set FD, compute key pointer, call
  bpf_map_lookup_elem, jump to next rule on NULL).

- bf_set_generate_trie_lookup: complete LPM trie key assembly and
  lookup. Writes prefixlen at scratch[4], copies the address to
  scratch[8] via bf_stub_load, then calls bf_set_generate_map_lookup.
  Replaces ~30 lines of IPv4/IPv6-branching trie bytecode that would
  otherwise be duplicated between packet and cgroup_sock_addr flavors.

The trie path in bf_matcher_generate_set is inlined since it reduces to
three calls with the new helpers. The hash path calls
bf_set_generate_map_lookup instead of inlining the sequence.
- Notes
* - :rspan:`1` Source address
- :rspan:`1` ``ip6.saddr``
* - :rspan:`2` Source address
Copy link
Copy Markdown
Contributor Author

@yaakov-stein yaakov-stein Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by fix (document ip6.d/saddr works in sets)

@yaakov-stein yaakov-stein marked this pull request as ready for review March 31, 2026 18:55
@yaakov-stein yaakov-stein requested a review from qdeslandes March 31, 2026 18:55
@@ -204,6 +208,98 @@ static int _bf_cgroup_sock_addr_generate_port(struct bf_program *program,
bf_matcher_payload(matcher), 2, BPF_REG_1);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: _bf_cgroup_sock_addr_ctx_offset is a non-trivial function mapping matcher types to bpf_sock_addr field offsets, with a sentinel return convention ((size_t)-1 for unsupported types). A brief Doxygen comment documenting the purpose, the return convention, and the handled matcher types would help maintainability. Other comparable helpers in this file (e.g., _bf_cgroup_sock_addr_generate_port) have documentation.

For BF_MATCHER_SET, iterate set key components through the same
checked_layers dedup logic as regular matchers. Extract
_bf_program_check_proto to share the check between both paths.
Check each set key component against the chain's hook and track set
components in the per-layer compatibility check. Extract
_bf_rule_check_layer to share the logic between regular matchers and
set components.
@yaakov-stein yaakov-stein force-pushed the add_csa_set_support branch from 683d9a7 to 2f9ddba Compare April 1, 2026 16:26
@yaakov-stein yaakov-stein force-pushed the add_csa_set_support branch from 2f9ddba to 5ab9f35 Compare April 1, 2026 16:29

if (!meta) {
return bf_err_r(-EINVAL, "missing meta for '%s'",
bf_matcher_type_to_str(type));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: In the hash-map loop of _bf_matcher_pkt_generate_set, bf_stub_load_header is called for every set component. When multiple components share the same protocol layer (e.g. ip4.saddr and ip4.daddr are both L3), the second call redundantly reloads R6 from the same stack slot, emitting unnecessary BPF instructions.

Consider tracking the loaded layer (similar to how _bf_program_check_proto uses checked_layers) to skip redundant header loads. This reduces the generated BPF instruction count for multi-component sets with same-layer keys.

*/
static int _bf_rule_check_layer(enum bf_matcher_type type,
uint32_t layer_hdr_id[_BF_MATCHER_LAYER_MAX])
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: _bf_rule_check_layer takes layer_hdr_id as an array parameter (decays to pointer) but does not assert() it. The style guide requires assert() for pointer preconditions, and every other function in this file that takes a pointer asserts on it (e.g. _bf_rule_references_empty_set, _bf_rule_has_incompatible_matchers).

Suggested change
{
{
const struct bf_matcher_meta *meta = bf_matcher_get_meta(type);

Should be:

{
    assert(layer_hdr_id);

    const struct bf_matcher_meta *meta = bf_matcher_get_meta(type);

(! udp6_connect ${HOST_IP6_ADDR} 9991)
${FROM_NS} ${BFCLI} ruleset flush

# (ip6.daddr, udp.dport) multi-component hash set
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: The behavioral test blocks in connect6.sh are ordered: multi-component hash set, then single-component hash set, then trie set. In connect4.sh the order is: single-component hash, trie, multi-component. Consider using a consistent ordering across the IPv4 and IPv6 test files for readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant