Skip to content

fix(scx_lavd): Fix combinatorial state explosion in energy model generation causing absurd memory usage#3548

Open
BHLuotianyi wants to merge 2 commits into
sched-ext:mainfrom
BHLuotianyi:lavd-mem-fix-GEMINI
Open

fix(scx_lavd): Fix combinatorial state explosion in energy model generation causing absurd memory usage#3548
BHLuotianyi wants to merge 2 commits into
sched-ext:mainfrom
BHLuotianyi:lavd-mem-fix-GEMINI

Conversation

@BHLuotianyi

@BHLuotianyi BHLuotianyi commented Apr 30, 2026

Copy link
Copy Markdown

DISCLAIMER: This PR needs a thorough review, as I know nothing about the code AI writes! But it tests fine on my Laptop.

I know AI may write trash, but this can at least provide an insight of the problem.

Regarding issue #3340

Description:

Problem

The original Energy Model (EM) initialization in scx_lavd used an exhaustive subset enumeration approach. On high-core-count systems, this triggered a $2^n$
complexity explosion, causing the scheduler to hang and consume excessive memory (RSS ballooning to several GiBs) during startup.

Solution

I have implemented a two-stage optimization to resolve this:

  1. Algorithm Redesign (DP):
    The initialization logic was refactored from subset enumeration to a Dynamic Programming (DP) approach. Instead of expanding all possible combinations, it now
    directly considers CPU counts per performance domain and accumulates the lowest-power states for each performance level.
  2. Aggressive State Pruning:
    On symmetric systems (e.g., AMD Zen), many different CPU distributions yield the exact same (performance, power) metrics. Storing all these equivalent
    permutations would still lead to a state explosion.
    This fix implements strict pruning: for any given performance/power pair, the optimizer now retains only a single optimal representative. When multiple
    combinations are equivalent, it prioritizes the one using the fewest performance domains (pd_id_set.len()). This ensures the state table remains small while
    favoring configurations with better cache locality and reduced leakage power.

Results

  • Performance: Initialization time reduced from a 15s+ hang to under 0.1s.
  • Memory: Eliminated the multi-GB RSS blowup; the process now starts with a stable, flat memory footprint (~80 MiB).
  • Correctness: The Energy Model remains fully functional, and CPU preference ordering is correctly generated based on the system's energy profile.

In the prior change that moved the CPU preference generation to a DP
algorithm, `EnergyModelOptimizer::insert_best_pdsi()` and
`insert_pds_combinations()` unconditionally preserved all identical
`(performance, power)` states across identical or symmetric performance
domains. This led to a massive combinatorial explosion of tracked
`HashSet` states during startup, severely ballooning RSS memory to
multiple gigabytes and hanging the startup process before the BPF
scheduler could initialize.

This fix aggressively prunes equivalent states. For any given
`(performance, power)` pair, if the new combination yields the same
power profile but uses fewer performance domains (`pd_id_set.len()`), it
replaces the old state. If it uses more or equal domains, it is discarded.
This strictly bounds the DP state tree per performance bucket to a single
optimal representative that favors leakage power and locality, solving
the memory explosion and cutting the startup time to under 0.1s.

Resolves sched-ext#3340
@BHLuotianyi BHLuotianyi changed the title fix(scx_lavd): Fix combinatorial state explosion in DP energy model generation fix(scx_lavd): Fix combinatorial state explosion in energy model generation causing absurd memory usage Apr 30, 2026
@multics69

Copy link
Copy Markdown
Contributor

@BHLuotianyi -- Thanks for trying LAVD. Could you share a tarball under /sys/kernel/debug/energy_model? What is the symptom you observed? I'd like to understand what the problem is first.

@BHLuotianyi

BHLuotianyi commented May 5, 2026

Copy link
Copy Markdown
Author

@BHLuotianyi -- Thanks for trying LAVD. Could you share a tarball under /sys/kernel/debug/energy_model? What is the symptom you observed? I'd like to understand what the problem is first.

@multics69

energy_model.tar.gz

Symptom: scx_lavd causes absurd memory usage and one core 100% before the RAM usage hits its ceiling. The system stutters a lot. The symptom is not observed when using any other scxes.

According to #3340 , the more cores / threads the CPU has, the higher the mem usage is (observed ~30GB; ~5GB on my setup)

According to AI, an exponential growth of RAM usage happens due to energy model generation. The more cores the system has, the more RAM the scx_lavd takes.

图片 图片

@multics69

Copy link
Copy Markdown
Contributor

Thanks @BHLuotianyi for sharing the data. Could you share the processor model? If there is a machine that I can access, I will also try it on my side too.

@BHLuotianyi

Copy link
Copy Markdown
Author

Thanks @BHLuotianyi for sharing the data. Could you share the processor model? If there is a machine that I can access, I will also try it on my side too.

@multics69
It's my pleasure to help. For my machine it's Intel(R) Core(TM) Ultra 7 255HX, which is a laptop processor with a core setup of 8P12E. For the ~30GB mem usage case in the referenced issue, it's a Intel Core Ultra 9 275HX with 8P16E.

If needed, I can provide my machine for test via VNC connection.

@multics69

Copy link
Copy Markdown
Contributor

Thanks @BHLuotianyi for sharing the data. Could you share the processor model? If there is a machine that I can access, I will also try it on my side too.

@multics69 It's my pleasure to help. For my machine it's Intel(R) Core(TM) Ultra 7 255HX, which is a laptop processor with a core setup of 8P12E. For the ~30GB mem usage case in the referenced issue, it's a Intel Core Ultra 9 275HX with 8P16E.

If needed, I can provide my machine for test via VNC connection.

Thanks @BHLuotianyi for sharing the data. Could you share the processor model? If there is a machine that I can access, I will also try it on my side too.

@multics69 It's my pleasure to help. For my machine it's Intel(R) Core(TM) Ultra 7 255HX, which is a laptop processor with a core setup of 8P12E. For the ~30GB mem usage case in the referenced issue, it's a Intel Core Ultra 9 275HX with 8P16E.

If needed, I can provide my machine for test via VNC connection.

Thanks for the extra info. I will try to take a deeper look and come up with another solution (if necessary) this weekend.

@xirreal

xirreal commented Jun 15, 2026

Copy link
Copy Markdown

Any updates on this?

@multics69

Copy link
Copy Markdown
Contributor

Sorry, @xirreal ! I didn't have time to work on this yet. Will find some time this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants