Skip to content

scx_lavd: support large x86_64 systems (raise LAVD_CPU_ID_MAX to 8192)#3634

Open
jkkm wants to merge 2 commits into
sched-ext:mainfrom
jkkm:lavd-cpu-id-max-8192
Open

scx_lavd: support large x86_64 systems (raise LAVD_CPU_ID_MAX to 8192)#3634
jkkm wants to merge 2 commits into
sched-ext:mainfrom
jkkm:lavd-cpu-id-max-8192

Conversation

@jkkm

@jkkm jkkm commented Jun 8, 2026

Copy link
Copy Markdown

Two commits:

  1. scx_lavd: bound per-cpdom cpumask scan loops by nr_cpu_ids
  2. scx_lavd: raise LAVD_CPU_ID_MAX to 8192 to support large x86_64 systems

Problem

scx_lavd panics in Scheduler::init ("Num possible CPU IDs (N) exceeds
maximum of (512)") on any machine with more than 512 possible CPUs, so it
cannot run there at all. LAVD_CPU_ID_MAX sizes the CPU-id-indexed rodata
arrays (cpu_capacity, cpu_sibling, cpu_big, cpu_turbo, pco_table) and
the per-cpdom __cpumask bitmap.
Separately, the per-cpdom bitmap scan loops iterate the full
LAVD_CPU_ID_MAX/64 longs regardless of the actual CPU count, so raising the
cap would add wasted iterations on every system.

Fix

Commit 1 breaks out of each scan loop (collect_sys_stat,
plan_x_cpdom_migration helper, init path) once the long index is past
nr_cpu_ids. __cpumask bits are only set for CPU ids < nr_cpu_ids, so the
remaining longs are always zero. The loop keeps its compile-time
LAVD_CPU_ID_MAX/64 bound, so the __cpumask[i] access stays provably in range
for the verifier — the test is just an early exit. The scan now costs
ceil(nr_cpu_ids/64) iterations regardless of LAVD_CPU_ID_MAX.
Commit 2 raises LAVD_CPU_ID_MAX from 512 to 8192 (x86_64 MAXSMP).

Testing (NR_CPUS=1024 kernel, veristat)

  • All programs verify; verifier complexity is unchanged by the bump (the
    scan loops are open-coded iterators, verified once regardless of bound).
  • No runtime cost from the larger cap: commit 1 makes the loops scale with
    nr_cpu_ids, so on a 316-CPU host the scan runs 5 iterations whether the cap
    is 512 or 8192 (vs 128 for an un-bounded 8192 loop).
  • Static cost of the bump: ~345 KB per scheduler instance (~225 KB rodata
    for the CPU-id-indexed arrays + ~120 KB for the per-cpdom __cpumask).

Note

8192 is the x86_64 maximum; architectures with a larger NR_CPUS may need a
higher value (or sizing the cap from the arch maximum).

Kyle McMartin added 2 commits June 8, 2026 12:58
The per-compute-domain bitmap scan loops in collect_sys_stat(),
plan_x_cpdom_migration()'s helper, and the init path iterate the full
LAVD_CPU_ID_MAX/64 longs of cpdom_ctx.__cpumask regardless of how many
CPUs the system actually has. On a machine with few CPUs (or a large
LAVD_CPU_ID_MAX), the upper longs are always zero, so the inner
bit-scan breaks immediately, but the outer loop still visits every
long.

Break out of the outer loop once the long index is past nr_cpu_ids:
__cpumask bits are only ever set for CPU ids < nr_cpu_ids, so the
remaining longs are guaranteed zero. The loop keeps its compile-time
LAVD_CPU_ID_MAX/64 bound, so the __cpumask[i] access is still provably
in range for the verifier; the added test is just an early exit.

This makes the scan cost scale with the actual CPU count
(ceil(nr_cpu_ids/64) iterations) instead of LAVD_CPU_ID_MAX, with no
change in verifier complexity (the loops are open-coded iterators).

No functional change.

Signed-off-by: Kyle McMartin <jkkm@meta.com>
scx_lavd refuses to start (panic in Scheduler::init) when the system
has more possible CPUs than LAVD_CPU_ID_MAX, which was 512. Machines
with more than 512 CPUs therefore cannot run scx_lavd at all.

Raise the cap to 8192, matching x86_64's MAXSMP NR_CPUS, so scx_lavd
runs on any x86_64 configuration.

The constant sizes CPU-id-indexed rodata arrays (cpu_capacity,
cpu_sibling, cpu_big, cpu_turbo, pco_table) and the per-cpdom
__cpumask bitmap, so this adds ~345 KB of static data per scheduler
instance. With the preceding change bounding the cpumask scan loops by
nr_cpu_ids, there is no runtime cost: the loops still iterate only
ceil(nr_cpu_ids/64) times, and verifier complexity is unchanged.

Note: 8192 is the x86_64 maximum; architectures with a larger NR_CPUS
may need a higher value.

Signed-off-by: Kyle McMartin <jkkm@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant