scx_layered: add per-layer irq_protect with fallback policy#3586
Draft
hodgesds wants to merge 1 commit into
Draft
scx_layered: add per-layer irq_protect with fallback policy#3586hodgesds wants to merge 1 commit into
hodgesds wants to merge 1 commit into
Conversation
5bfe7ef to
dbfe858
Compare
Layers can now opt in to clearing their CPUs from every IRQ's
`/proc/irq/<N>/smp_affinity` mask via a new `irq_protect: bool` field
on `LayerCommon`. This is useful for GPU training workloads where
collective-communication threads (e.g. NCCL) need very low jitter and
must not share CPUs with NIC/GPU IRQ handlers.
The protector snapshots every numeric IRQ's affinity at startup, applies
fresh masks after each `refresh_cpumasks` based on the union of
protected layers' CPUs, and restores originals on shutdown. Kernel
managed IRQs that reject affinity writes are recorded after the first
failure and never retried.
A new `--irq-protect-fallback={spread,all}` flag selects what happens
when an IRQ's home affinity is fully covered by protected layers (or
when every CPU on the system is protected):
* `spread` (default) preserves per-IRQ locality; per-IRQ spill to the
unprotected set; and when no CPU is unprotected, pin each IRQ to a
single CPU round-robin across all CPUs.
* `all` gives every IRQ the same mask: the system-wide unprotected
set, or all CPUs when nothing is unprotected.
Mode transitions and per-IRQ spill events are logged once. Includes
unit tests covering normal strip + spill, managed-IRQ skip, redundant
write elision, restore, spread mode (basic + wrap), spread<->normal
transition, and both All-mode behaviors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Daniel Hodges <hodgesd@meta.com>
dbfe858 to
a404122
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Layers can now opt in to clearing their CPUs from every IRQ's
/proc/irq/<N>/smp_affinitymask via a newirq_protect: boolfield onLayerCommon. This is useful for GPU training workloads where collective-communication threads (e.g. NCCL) need very low jitter and must not share CPUs with NIC/GPU IRQ handlers.The protector snapshots every numeric IRQ's affinity at startup, applies fresh masks after each
refresh_cpumasksbased on the union of protected layers' CPUs, and restores originals on shutdown. Kernel managed IRQs that reject affinity writes are recorded after the first failure and never retried.A new
--irq-protect-fallback={spread,all}flag selects what happens when an IRQ's home affinity is fully covered by protected layers (or when every CPU on the system is protected):spread(default) preserves per-IRQ locality; per-IRQ spill to the unprotected set; and when no CPU is unprotected, pin each IRQ to a single CPU round-robin across all CPUs.allgives every IRQ the same mask: the system-wide unprotected set, or all CPUs when nothing is unprotected.Mode transitions and per-IRQ spill events are logged once. Includes unit tests covering normal strip + spill, managed-IRQ skip, redundant write elision, restore, spread mode (basic + wrap), spread<->normal transition, and both All-mode behaviors.