scheds/experimental/scx_flow: v2.2.4 — priority-aware rt_sensitive and relaxed preempt threshold#3561
Merged
Merged
Conversation
…eups The preempt_ready condition required both last_refill_ns >= preempt_refill_min_ns AND budget_ns >= preempt_budget_min_ns. For short-interval wakeups such as cyclictest at 200us period, the per-wakeup refill (sleep_ns / 100) is only ~2us, far below the 200us preempt_refill_min_ns threshold. This caused all short-interval wakeups to miss the WAKE_PROFILE_PREEMPT_READY profile bit and fall through to the reserved DSQ path instead of the direct SCX_DSQ_LOCAL_ON fast path, adding measurable dispatch latency. Removed the last_refill_ns condition so that preempt_ready depends only on budget_ns >= preempt_budget_min_ns. A task's accumulated positive budget is sufficient evidence of responsiveness — the refill history gate was unnecessarily penalizing high-frequency wakeups. Version bumped to 2.2.2. Fixes: hard-RT max latency gap vs scx_cosmos (429us vs 188us)
Adds a dedicated bypass in flow_enqueue for tasks that have been sleeping for less than FLOW_INTERACTIVE_SLEEP_MIN_NS (750us). Such tasks are extremely high-frequency wakeups (e.g. cyclictest at 200us, timer-driven periodic work) and should skip all lane analysis, wake-profile routing, and budget refill gates. The fast path checks: 1. is_wakeup && tctx && budget_ns > 0 — positive budget task 2. !containment_active — not a hog 3. last_sleep_ns <= 750us — short sleep, clearly responsive 4. valid target CPU — can dispatch locally When matched, the task is inserted directly to SCX_DSQ_LOCAL_ON with a minimal slice (FLOW_SLICE_MIN_NS) and returns immediately. This bypasses all of: - Wake profile recomputation and bit checks - should_preempt / lane routing - Urgent, latency, reserved, contained, shared DSQ arbitration - Per-wakeup counter tracking Combined with the earlier preempt_ready relaxed refill check and the idle-CPU local-reserved fix, this ensures that high-frequency wakeups consistently take the fastest possible dispatch path regardless of load or lane state.
…reshold The rt_sensitive_ready condition now separates the refill threshold from the priority check. An RT-priority task (p->prio < 100) only needs any positive refill to qualify for the WAKE_PROFILE_RT_SENSITIVE bit, while non-RT tasks still need the full FLOW_INTERACTIVE_FLOOR_MIN_NS (80 us) threshold. Without this fix the p->prio check was useless in practice because the common-path refill threshold also required last_refill_ns >= 80 us. Cyclictest at 200 us period produces a per-wakeup refill of only 2 us (sleep_ns / 100), which never satisfied the 80 us gate. With the threshold separated, RT-priority tasks get the low-latency preempt path through the existing lane machinery while non-RT behaviour is unchanged. Signed-off-by: Galih Tama <galpt@v.recipes>
Bump the benchmark snapshot reference to match the current release line. Signed-off-by: Galih Tama <galpt@v.recipes>
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the experimental scx_flow sched_ext scheduler to v2.2.3 by refining wakeup classification and local dispatch behavior to improve responsiveness (especially for RT-priority tasks) without reintroducing the prior short-sleep lane-analysis bypass.
Changes:
- Make
rt_sensitivewake profiling priority-aware by marking RT-priority tasks (p->prio < 100) asWAKE_PROFILE_RT_SENSITIVEon positive refill. - Relax
preempt_readygating to depend on accumulated budget only (drops the refill-threshold requirement). - Extend the local-reserved fast path to cover wakeups targeting an idle CPU.
- Bump version references to 2.2.3 in
Cargo.toml,Cargo.lock, and README text.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| scheds/experimental/scx_flow/src/bpf/main.bpf.c | Adjusts wake-profile logic and enqueue fast-path routing (RT sensitivity, preempt gating, idle-CPU local dispatch). |
| scheds/experimental/scx_flow/README.md | Updates benchmark snapshot label to v2.2.3 (link currently appears inconsistent). |
| scheds/experimental/scx_flow/Cargo.toml | Bumps crate version to 2.2.3. |
| Cargo.lock | Updates locked scx_flow package version to 2.2.3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sirlucjan
approved these changes
May 8, 2026
sirlucjan
left a comment
Collaborator
There was a problem hiding this comment.
Tested, everything is OK.
The v2.2.3 change that relaxed the rt_sensitive_ready condition had a logical error: last_refill_ns >= FLOW_INTERACTIVE_FLOOR_MIN_NS (80 us) was placed as a standalone OR branch, causing any task with 80 us or more of recent refill to qualify as RT-sensitive regardless of priority or affinity. This falsely classified SCHED_OTHER periodic tasks as RT-sensitive, blocking them from the latency lane, forcing 50 us preemption slices, and regressing max latency from 173 us (v2.2.0) to 476 us (v2.2.3) in the 100 us benchmark. Fix by ANDing the refill threshold with the genuine RT classification (pinned or prio < 100) instead of ORing it. Non-RT tasks with good refill retain access to the latency lane and the idle-CPU fast path. Signed-off-by: Galih Tama <galpt@v.recipes>
Signed-off-by: Galih Tama <galpt@v.recipes>
…ccess The idle-CPU local-reserved path added in v2.2.2 caused all wakeups to idle CPUs to short-circuit flow_enqueue via use_local_reserved, which bypassed the latency lane insertion code. This meant tasks like cyclictest that should have been routed through the LATENCY_DSQ (where they get priority dispatch ahead of reserved/shared DSQ entries) instead landed directly on SCX_DSQ_LOCAL_ON, losing their scheduling priority on the next busy-CPU wakeup. Remove the (tctx->wake_cpu_idle && is_wakeup) condition from use_local_reserved. Idle-CPU wakeups now flow through the normal lane routing: the per-CPU reserved DSQ with SCX_KICK_IDLE for non-latency wakeups, or the LATENCY_DSQ when the task has accumulated latency allowance. RT-sensitive tasks (SCHED_FIFO, pinned) and IPC tasks still get the fast local-reserved path via should_preempt and ipc_confidence_wakeup respectively, which remain correct. Signed-off-by: Galih Tama <galpt@v.recipes>
hodgesds
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is an incremental update on top of the
scx_flow v2.2.0scheduler thatlanded in PR #3525. It carries two changes since that submission, both aimed
at improving worst-case wakeup latency without adding code paths that bypass
the existing lane analysis or containment machinery:
rt_sensitive_readycondition nowchecks
p->prioso that kernel RT-class tasks (SCHED_FIFO / SCHED_RR atany priority) qualify for
WAKE_PROFILE_RT_SENSITIVEon sufficient refill,rather than needing the combination of being pinned and having the 80us
refill floor. Fixed in v2.2.4: the refill threshold is properly ANDed with
the genuine RT classification
(pinned || prio < 100)so well-refilledSCHED_OTHER tasks are not falsely classified as RT-sensitive.
preempt_readycondition required both
last_refill_ns >= preempt_refill_min_ns(200us)and
budget_ns >= preempt_budget_min_ns(150us). For short-intervalwakeups such as cyclictest at 200us the per-wakeup refill is only
sleep_ns / 100 = 2us, far below the 200us threshold. Dropped therefill gate so that accumulated positive budget alone qualifies a task for
the preempt path.
Changes
The BPF changes are confined to one function in
main.bpf.c.1. Priority-aware rt_sensitive (v2.2.3, fixed in v2.2.4)
p->prio < 100identifies kernel SCHED_FIFO and SCHED_RR tasks (which havepriority 0--99).
2. Relaxed preempt_ready refill check
The idle-CPU local-reserved path that was present in v2.2.2/v2.2.3 has been
removed — it bypassed the latency lane routing, causing periodic tasks to
lose their priority dispatch slot. The net submission only carries the two
changes above.
Benchmark Results
Full tagged artifacts (PNG, SVG, CSV, report):
https://github.com/galpt/testing-scx_flow/tree/benchmark-archives/20260409_scx_flow_v2.2.0_release/mini/v2.2.4
All runs on CachyOS 7.0.3-1-cachyos, 16-core AMD system, Balanced power profile.
Normal mode (cyclictest -D 30 -t 4 -a 0 -m -v)
Hard RT mode (cyclictest --priority=99 --smp --interval=200 --histogram=20)
Note: hard RT cyclictest runs at SCHED_FIFO 99 which is dispatched by the
kernel's rt_sched_class, not by ext_sched_class. The hard RT results reflect
system noise under SCX background load rather than the scheduler policy
itself, and are consistent across all tested schedulers.
Files Changed
Signed-off-by: Galih Tama galpt@v.recipes