Skip to content

scx_lavd: fix an SCX_DSQ_LOCAL_ON runtime error in lavd_enqueue#3642

Open
likewhatevs wants to merge 3 commits into
sched-ext:mainfrom
likewhatevs:lavd-fix-deferred-local-on
Open

scx_lavd: fix an SCX_DSQ_LOCAL_ON runtime error in lavd_enqueue#3642
likewhatevs wants to merge 3 commits into
sched-ext:mainfrom
likewhatevs:lavd-fix-deferred-local-on

Conversation

@likewhatevs

Copy link
Copy Markdown
Contributor

lavd_misplaced_local_on (a new ktstr test, gated behind the ktstr-tests
feature) drives scx_lavd into a runtime error. Against unmodified scx_lavd it
fails consistently:

SCX_DSQ_LOCAL[_ON] target CPU N not allowed for <task>
scx_exit <- task_can_run_on_remote_rq <- dispatch_to_local_dsq

Two changes to lavd_enqueue, described by their effect on that test:

  • Fall back to the task's current CPU when the picked CPU is invalid or not in
    the task's allowed mask. This takes the test from a consistent failure to a
    sporadic one (same error).
  • Direct-dispatch SCX_DSQ_LOCAL_ON only when the picked CPU is the task's
    current CPU; otherwise enqueue to the domain DSQ. The remaining sporadic
    failure surfaced as a runnable-task stall (runnable task stall (<task> failed to run for N s)); after this the test passes.

The test is feature-gated, so a normal scx_lavd build is unaffected — the
harness is pulled in only with --features ktstr-tests.

Test plan

  • lavd_misplaced_local_on passes with the fix (cargo ktstr test, --features ktstr-tests).
  • Without the fix, the test reproduces the runtime error above.

Add lavd_misplaced_local_on. Against unmodified scx_lavd it fails
consistently with:
  SCX_DSQ_LOCAL[_ON] target CPU N not allowed for <task>
  scx_exit <- task_can_run_on_remote_rq <- dispatch_to_local_dsq
Gated behind the ktstr-tests feature.

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
…avd_enqueue

When the CPU picked in lavd_enqueue is invalid or not in the task's
allowed mask, fall back to the task's current CPU. This takes
lavd_misplaced_local_on from a consistent failure to a sporadic one:
  SCX_DSQ_LOCAL[_ON] target CPU N not allowed for <task>
  scx_exit <- task_can_run_on_remote_rq <- dispatch_to_local_dsq

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
… CPU

After the off-mask fallback, lavd_misplaced_local_on still failed
sporadically, as a runnable-task stall:
  runnable task stall (<task> failed to run for N s)
Direct-dispatch SCX_DSQ_LOCAL_ON only when the picked CPU is the task's
current CPU; otherwise enqueue to the domain DSQ. The test then passes.

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant