Skip to content

scx_layered: add per-layer irq_protect with fallback policy#3586

Draft
hodgesds wants to merge 1 commit into
sched-ext:mainfrom
hodgesds:layered-irq-protect
Draft

scx_layered: add per-layer irq_protect with fallback policy#3586
hodgesds wants to merge 1 commit into
sched-ext:mainfrom
hodgesds:layered-irq-protect

Conversation

@hodgesds

Copy link
Copy Markdown
Contributor

Layers can now opt in to clearing their CPUs from every IRQ's /proc/irq/<N>/smp_affinity mask via a new irq_protect: bool field on LayerCommon. This is useful for GPU training workloads where collective-communication threads (e.g. NCCL) need very low jitter and must not share CPUs with NIC/GPU IRQ handlers.

The protector snapshots every numeric IRQ's affinity at startup, applies fresh masks after each refresh_cpumasks based on the union of protected layers' CPUs, and restores originals on shutdown. Kernel managed IRQs that reject affinity writes are recorded after the first failure and never retried.

A new --irq-protect-fallback={spread,all} flag selects what happens when an IRQ's home affinity is fully covered by protected layers (or when every CPU on the system is protected):

  • spread (default) preserves per-IRQ locality; per-IRQ spill to the unprotected set; and when no CPU is unprotected, pin each IRQ to a single CPU round-robin across all CPUs.
  • all gives every IRQ the same mask: the system-wide unprotected set, or all CPUs when nothing is unprotected.

Mode transitions and per-IRQ spill events are logged once. Includes unit tests covering normal strip + spill, managed-IRQ skip, redundant write elision, restore, spread mode (basic + wrap), spread<->normal transition, and both All-mode behaviors.

@hodgesds hodgesds force-pushed the layered-irq-protect branch from 5bfe7ef to dbfe858 Compare May 18, 2026 19:15
Layers can now opt in to clearing their CPUs from every IRQ's
`/proc/irq/<N>/smp_affinity` mask via a new `irq_protect: bool` field
on `LayerCommon`. This is useful for GPU training workloads where
collective-communication threads (e.g. NCCL) need very low jitter and
must not share CPUs with NIC/GPU IRQ handlers.

The protector snapshots every numeric IRQ's affinity at startup, applies
fresh masks after each `refresh_cpumasks` based on the union of
protected layers' CPUs, and restores originals on shutdown. Kernel
managed IRQs that reject affinity writes are recorded after the first
failure and never retried.

A new `--irq-protect-fallback={spread,all}` flag selects what happens
when an IRQ's home affinity is fully covered by protected layers (or
when every CPU on the system is protected):

  * `spread` (default) preserves per-IRQ locality; per-IRQ spill to the
    unprotected set; and when no CPU is unprotected, pin each IRQ to a
    single CPU round-robin across all CPUs.
  * `all` gives every IRQ the same mask: the system-wide unprotected
    set, or all CPUs when nothing is unprotected.

Mode transitions and per-IRQ spill events are logged once. Includes
unit tests covering normal strip + spill, managed-IRQ skip, redundant
write elision, restore, spread mode (basic + wrap), spread<->normal
transition, and both All-mode behaviors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Daniel Hodges <hodgesd@meta.com>
@hodgesds hodgesds force-pushed the layered-irq-protect branch from dbfe858 to a404122 Compare May 19, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant