Skip to content

Remove SCHED_EEVDF, restore O(1) priority#12

Merged
jserv merged 1 commit into
mainfrom
remove-eevdf
May 14, 2026
Merged

Remove SCHED_EEVDF, restore O(1) priority#12
jserv merged 1 commit into
mainfrom
remove-eevdf

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented May 14, 2026

EEVDF's two-pass picker walked every runnable peer under pcpu_runq_lock on every dispatch, which weakens the bounded dispatch-cost invariant the hard-RT contract rests on; its cross-CPU pcpu_min_vruntime read at the idle-steal and load-balance migration sites was unsynchronized; and its fair-share semantics are not property PSE51 callers express importance through (they use priority). Removing the entire path reduces the picker to a bitmap-and-FIFO dequeue and eliminates the race.

Replace the eevdf_fairness selftest with quantum_rotation_fairness: pin N CPU-bound equal-priority workers to one CPU, busy-loop on time_rdtime in each worker, and assert each worker's worst-case run-to-run latency stays within (N - 1) * quantum + jitter slack and that the worker was descheduled at least once. This exercises the timer-driven quantum expiry path that intra-band rotation actually relies on, not the voluntary-yield path the old test was inadvertently hitting via sleep_ms(0).


Summary by cubic

Remove EEVDF and return to O(1) priority scheduling with quantum-based FIFO rotation among equal-priority threads. This restores bounded dispatch cost, removes a cross-CPU race, and better aligns with PSE51 priority semantics.

  • Refactors

    • Drop CONFIG_SCHED_EEVDF and all vruntime/deadline code; picker is now bitmap scan + FIFO within a priority level.
    • Equal-priority rotation is driven by timer quanta; no RR/OTHER behavior is exposed beyond SCHED_FIFO.
    • Update docs and configs to remove EEVDF and clarify the single normal-thread policy; sched_{set,get}scheduler accept SCHED_OTHER/SCHED_RR but always map/report SCHED_FIFO.
    • Replace eevdf_fairness with quantum_rotation_fairness: N CPU-bound peers on one CPU must see worst-case service gaps ≤ (N−1) quanta + small jitter, and each must be descheduled at least once.
  • Bug Fixes

    • Eliminate the unsynchronized cross-CPU min-vruntime read in idle-steal and load-balance.
    • Restore predictable, bounded pick-next cost required for the hard-RT contract.

Written for commit 2f260a0. Summary will update on new commits.

EEVDF's two-pass picker walked every runnable peer under pcpu_runq_lock
on every dispatch, which weakens the bounded dispatch-cost invariant the
hard-RT contract rests on; its cross-CPU pcpu_min_vruntime read at the
idle-steal and load-balance migration sites was unsynchronized; and its
fair-share semantics are not property PSE51 callers express importance
through (they use priority). Removing the entire path reduces the picker
to a bitmap-and-FIFO dequeue and eliminates the race.

Replace the eevdf_fairness selftest with quantum_rotation_fairness: pin
N CPU-bound equal-priority workers to one CPU, busy-loop on time_rdtime
in each worker, and assert each worker's worst-case run-to-run latency
stays within (N - 1) * quantum + jitter slack and that the worker was
descheduled at least once. This exercises the timer-driven quantum expiry
path that intra-band rotation actually relies on, not the voluntary-yield
path the old test was inadvertently hitting via sleep_ms(0).
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 13 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/tests-sched.c">

<violation number="1" location="tests/tests-sched.c:558">
P1: The failure path returns before canceling `dom.refill_callout`, which can leave a timer referencing stack memory after `test_sched_domain_budget()` exits.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread tests/tests-sched.c
SCHED_PRIO_NORMAL, 0);
if (r.is_error) {
enable_interrupts();
return 1;
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The failure path returns before canceling dom.refill_callout, which can leave a timer referencing stack memory after test_sched_domain_budget() exits.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/tests-sched.c, line 558:

<comment>The failure path returns before canceling `dom.refill_callout`, which can leave a timer referencing stack memory after `test_sched_domain_budget()` exits.</comment>

<file context>
@@ -482,79 +482,113 @@ static i32 test_watchdog_activity(void)
+                              SCHED_PRIO_NORMAL, 0);
+        if (r.is_error) {
+            enable_interrupts();
+            return 1;
+        }
     }
</file context>
Fix with Cubic

@jserv jserv merged commit ca45427 into main May 14, 2026
7 checks passed
@jserv jserv deleted the remove-eevdf branch May 14, 2026 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant