Skip to content

scx_mitosis: adaptive stealing, orphan-DSQ drain, migration tracking#3572

Draft
tommy-u wants to merge 1 commit into
sched-ext:mainfrom
tommy-u:mitosis-steal-revamp
Draft

scx_mitosis: adaptive stealing, orphan-DSQ drain, migration tracking#3572
tommy-u wants to merge 1 commit into
sched-ext:mainfrom
tommy-u:mitosis-steal-revamp

Conversation

@tommy-u

@tommy-u tommy-u commented May 13, 2026

Copy link
Copy Markdown
Contributor
  1. Adaptive stealing controller. Per-cell sibling-LLC queue-depth
    threshold (per_cell_steal_min_queued[]) tuned by userspace from the
    observed steal-success ratio. try_stealing_work picks the most
    backed-up sibling LLC and only steals if its depth exceeds the
    threshold, throttling cross-LLC churn under low pressure.

  2. Drain for orphaned tasks. When apply_cell_config or a cgroup
    change shrinks a cell out of an LLC, recalc_cell_llc_counts now
    publishes the new per-LLC cpu_cnt and then iterates the stranded
    (cell, LLC) DSQ, moving each task to a sibling LLC's DSQ that the
    cell still owns CPUs in. Caller order is publish_prepared_cpumask
    first, then recalc, so re-pickers triggered by the cpu_cnt==0 guard
    see the new cell membership.

  3. Migration source tracking. New last_cpu_source field on task_ctx
    plus 2x3 cross-product counters (same|cross)x(select|enqueue|
    dispatch). track_cpu_migration in mitosis_running attributes each
    migration to whichever path placed the task. Stolen-task tagging
    collapses into maybe_retag_stolen_task (LLC mismatch is the unique
    steal signal at running time), removing the per-CPU stolen_dispatch
    flag and its dispatch-to-running race window.

  1) Adaptive stealing controller. Per-cell sibling-LLC queue-depth
     threshold (per_cell_steal_min_queued[]) tuned by userspace from the
     observed steal-success ratio. try_stealing_work picks the most
     backed-up sibling LLC and only steals if its depth exceeds the
     threshold, throttling cross-LLC churn under low pressure.

  2) Drain for orphaned tasks. When apply_cell_config or a cgroup
     change shrinks a cell out of an LLC, recalc_cell_llc_counts now
     publishes the new per-LLC cpu_cnt and then iterates the stranded
     (cell, LLC) DSQ, moving each task to a sibling LLC's DSQ that the
     cell still owns CPUs in. Caller order is publish_prepared_cpumask
     first, then recalc, so re-pickers triggered by the cpu_cnt==0 guard
     see the new cell membership.

  3) Migration source tracking. New last_cpu_source field on task_ctx
     plus 2x3 cross-product counters (same|cross)x(select|enqueue|
     dispatch). track_cpu_migration in mitosis_running attributes each
     migration to whichever path placed the task. Stolen-task tagging
     collapses into maybe_retag_stolen_task (LLC mismatch is the unique
     steal signal at running time), removing the per-CPU stolen_dispatch
     flag and its dispatch-to-running race window.
@tommy-u tommy-u requested a review from kkdwvd May 13, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant