fix(scx_lavd): Fix combinatorial state explosion in energy model generation causing absurd memory usage by BHLuotianyi · Pull Request #3548 · sched-ext/scx

BHLuotianyi · 2026-04-30T07:28:37Z

DISCLAIMER: This PR needs a thorough review, as I know nothing about the code AI writes! But it tests fine on my Laptop.

I know AI may write trash, but this can at least provide an insight of the problem.

Regarding issue #3340

Description:

Problem

The original Energy Model (EM) initialization in scx_lavd used an exhaustive subset enumeration approach. On high-core-count systems, this triggered a $2^n$
complexity explosion, causing the scheduler to hang and consume excessive memory (RSS ballooning to several GiBs) during startup.

Solution

I have implemented a two-stage optimization to resolve this:

Algorithm Redesign (DP):
The initialization logic was refactored from subset enumeration to a Dynamic Programming (DP) approach. Instead of expanding all possible combinations, it now
directly considers CPU counts per performance domain and accumulates the lowest-power states for each performance level.
Aggressive State Pruning:
On symmetric systems (e.g., AMD Zen), many different CPU distributions yield the exact same (performance, power) metrics. Storing all these equivalent
permutations would still lead to a state explosion.
This fix implements strict pruning: for any given performance/power pair, the optimizer now retains only a single optimal representative. When multiple
combinations are equivalent, it prioritizes the one using the fewest performance domains (pd_id_set.len()). This ensures the state table remains small while
favoring configurations with better cache locality and reduced leakage power.

Results

Performance: Initialization time reduced from a 15s+ hang to under 0.1s.
Memory: Eliminated the multi-GB RSS blowup; the process now starts with a stable, flat memory footprint (~80 MiB).
Correctness: The Energy Model remains fully functional, and CPU preference ordering is correctly generated based on the system's energy profile.

In the prior change that moved the CPU preference generation to a DP algorithm, `EnergyModelOptimizer::insert_best_pdsi()` and `insert_pds_combinations()` unconditionally preserved all identical `(performance, power)` states across identical or symmetric performance domains. This led to a massive combinatorial explosion of tracked `HashSet` states during startup, severely ballooning RSS memory to multiple gigabytes and hanging the startup process before the BPF scheduler could initialize. This fix aggressively prunes equivalent states. For any given `(performance, power)` pair, if the new combination yields the same power profile but uses fewer performance domains (`pd_id_set.len()`), it replaces the old state. If it uses more or equal domains, it is discarded. This strictly bounds the DP state tree per performance bucket to a single optimal representative that favors leakage power and locality, solving the memory explosion and cutting the startup time to under 0.1s. Resolves sched-ext#3340

multics69 · 2026-05-02T03:22:19Z

@BHLuotianyi -- Thanks for trying LAVD. Could you share a tarball under /sys/kernel/debug/energy_model? What is the symptom you observed? I'd like to understand what the problem is first.

BHLuotianyi · 2026-05-05T02:52:54Z

@BHLuotianyi -- Thanks for trying LAVD. Could you share a tarball under /sys/kernel/debug/energy_model? What is the symptom you observed? I'd like to understand what the problem is first.

@multics69

energy_model.tar.gz

Symptom: scx_lavd causes absurd memory usage and one core 100% before the RAM usage hits its ceiling. The system stutters a lot. The symptom is not observed when using any other scxes.

According to #3340 , the more cores / threads the CPU has, the higher the mem usage is (observed ~30GB; ~5GB on my setup)

According to AI, an exponential growth of RAM usage happens due to energy model generation. The more cores the system has, the more RAM the scx_lavd takes.

multics69 · 2026-05-05T03:19:46Z

Thanks @BHLuotianyi for sharing the data. Could you share the processor model? If there is a machine that I can access, I will also try it on my side too.

BHLuotianyi · 2026-05-05T03:33:49Z

Thanks @BHLuotianyi for sharing the data. Could you share the processor model? If there is a machine that I can access, I will also try it on my side too.

@multics69
It's my pleasure to help. For my machine it's Intel(R) Core(TM) Ultra 7 255HX, which is a laptop processor with a core setup of 8P12E. For the ~30GB mem usage case in the referenced issue, it's a Intel Core Ultra 9 275HX with 8P16E.

If needed, I can provide my machine for test via VNC connection.

multics69 · 2026-05-06T23:11:44Z

Thanks @BHLuotianyi for sharing the data. Could you share the processor model? If there is a machine that I can access, I will also try it on my side too.

@multics69 It's my pleasure to help. For my machine it's Intel(R) Core(TM) Ultra 7 255HX, which is a laptop processor with a core setup of 8P12E. For the ~30GB mem usage case in the referenced issue, it's a Intel Core Ultra 9 275HX with 8P16E.

If needed, I can provide my machine for test via VNC connection.

Thanks for the extra info. I will try to take a deeper look and come up with another solution (if necessary) this weekend.

xirreal · 2026-06-15T07:53:07Z

Any updates on this?

multics69 · 2026-06-15T08:27:25Z

Sorry, @xirreal ! I didn't have time to work on this yet. Will find some time this week.

BHLuotianyi added 2 commits April 30, 2026 11:31

scx_lavd: avoid exponential EM CPU order search

8f075c0

BHLuotianyi changed the title ~~fix(scx_lavd): Fix combinatorial state explosion in DP energy model generation~~ fix(scx_lavd): Fix combinatorial state explosion in energy model generation causing absurd memory usage Apr 30, 2026

This was referenced Jun 11, 2026

scx_lavd: make --no-use-em reach the userspace CPU-order builder #3644

Merged

scx_lavd --performance leaks massive memory (~34GB) and pegs CPU on CachyOS 6.19.2 (Intel Core Ultra 9 275HX) #3340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(scx_lavd): Fix combinatorial state explosion in energy model generation causing absurd memory usage#3548

fix(scx_lavd): Fix combinatorial state explosion in energy model generation causing absurd memory usage#3548
BHLuotianyi wants to merge 2 commits into
sched-ext:mainfrom
BHLuotianyi:lavd-mem-fix-GEMINI

BHLuotianyi commented Apr 30, 2026 •

edited

Loading

Uh oh!

multics69 commented May 2, 2026

Uh oh!

BHLuotianyi commented May 5, 2026 •

edited

Loading

Uh oh!

multics69 commented May 5, 2026

Uh oh!

BHLuotianyi commented May 5, 2026

Uh oh!

multics69 commented May 6, 2026

Uh oh!

xirreal commented Jun 15, 2026

Uh oh!

multics69 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

BHLuotianyi commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Problem

Solution

Results

Uh oh!

multics69 commented May 2, 2026

Uh oh!

BHLuotianyi commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

multics69 commented May 5, 2026

Uh oh!

BHLuotianyi commented May 5, 2026

Uh oh!

multics69 commented May 6, 2026

Uh oh!

xirreal commented Jun 15, 2026

Uh oh!

multics69 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BHLuotianyi commented Apr 30, 2026 •

edited

Loading

BHLuotianyi commented May 5, 2026 •

edited

Loading