Skip to content

scx_p2dq: fix RCU pointer violation in bpf_for_each dispatch loop#3499

Open
scrossley11 wants to merge 3 commits into
sched-ext:mainfrom
scrossley11:fix-p2dq-rcu-pointer
Open

scx_p2dq: fix RCU pointer violation in bpf_for_each dispatch loop#3499
scrossley11 wants to merge 3 commits into
sched-ext:mainfrom
scrossley11:fix-p2dq-rcu-pointer

Conversation

@scrossley11

Copy link
Copy Markdown

The bpf_for_each(scx_dsq, ...) iterator yields a non-trusted task pointer. Accessing p->cpus_ptr directly on this pointer fails BPF verification on 6.16 kernels which require RCU pointers for cpus_ptr.

Replace direct cpus_ptr accesses with the existing safe helpers (peek_cpumask_test_cpu, peek_cpumask_any_distribute) that obtain a trusted reference via bpf_task_from_pid() before accessing cpus_ptr.

This also fixes scx_chaos which includes p2dq's code via main.bpf.inc.

scrossley and others added 2 commits March 24, 2026 16:01
The bpf_for_each(scx_dsq, ...) iterator yields a non-trusted task
pointer. Accessing p->cpus_ptr directly on this pointer fails BPF
verification on 6.16 kernels which require RCU pointers for cpus_ptr.

Replace direct cpus_ptr accesses with the existing safe helpers
(peek_cpumask_test_cpu, peek_cpumask_any_distribute) that obtain a
trusted reference via bpf_task_from_pid() before accessing cpus_ptr.

This also fixes scx_chaos which includes p2dq's code via main.bpf.inc.
@htejun

htejun commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

This is for compatibility, right? Newer kernels shouldn't require trusted pointer. Maybe it'd be useful to add comment explaining that or move it to compat.bpf.h?

@scrossley11

Copy link
Copy Markdown
Author

This is for compatibility, right? Newer kernels shouldn't require trusted pointer. Maybe it'd be useful to add comment explaining that or move it to compat.bpf.h?

Newer than 6.16? If not then maybe this is not the right fix

@likewhatevs

Copy link
Copy Markdown
Contributor

p2dq should be good on stables going back to 6.13 per ci:

https://github.com/sched-ext/scx/actions/runs/23822398184/job/69455861674?pr=3499#step:9:1463

that runs the verifier with the topologies + everything else (i.e. rodata) of nothing (kinda invalid imo, but what we've always done) and of a 9950x (pretty valid).

@hodgesds

hodgesds commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

I think the problem is on some backported kernels it has trouble verifying.

@scrossley11 scrossley11 closed this Apr 7, 2026
@scrossley11 scrossley11 reopened this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants