scripts: add scx_smoke_test.py two-phase release-validation harness#3603
scripts: add scx_smoke_test.py two-phase release-validation harness#3603rrnewton wants to merge 2 commits into
Conversation
scx 1.1.1 reference smoke-test run (2026-05-21)Full output from Headline: 8 PASS, 3 FAIL out of 11.Of the three failures, exactly one is a real 1.1.1 release defect Results table
(Crates not on crates.io at 1.1.1: §1.
|
7e84cc3 to
56d5b8a
Compare
Update — improved script, second smoke run on v1.1.1 (2026-05-26)Per the follow-ups suggested in the original PR body, force-pushed an
Single commit, Rerun results — v1.1.1, 5 min eachHost: Movement vs the prior 2026-05-21 run
The 4 new FAILs are all BPF program load failures from the kernel Proof of
|
Capture from second run of scripts/scx_smoke_test.sh against scx 1.1.1 after applying skeptic-review-driven improvements (PR sched-ext/scx#3603 commit 56d5b8a9). 7 PASS / 4 FAIL / 0 KNOWN_FRAGILE / 0 ERROR. All 3 issues from the prior 2026-05-21 run resolved (scx_layered with bundled default spec, scx_cosmos via VERSION_FALLBACK -> 1.1.2, scx_rlfifo soft-pass via KNOWN_FRAGILE). 4 new FAILs are all BPF program-load failures from the 6.9 fbk-hardened kernel rejecting 1.1.1-era BPF skeletons - not script regressions. Reference run: experiments/scx_1_1_1_smoke_test_20260521/ (8/3 on 6.16). Results posted as PR comment: sched-ext/scx#3603 (comment)
|
Doesn't layered have |
56d5b8a to
c9da2d9
Compare
|
Force-pushed Two-phase design (
CLI: ``` Typical flows: ```bash Default: discover from workspace + crates.io, then run../scx_smoke_test.py run Inspect first, hand-edit, then run../scx_smoke_test.py discover -o m.json Short smoke (single scheduler)../scx_smoke_test.py run --manifest m.json --schedulers scx_rusty --duration 30 Discovery (16 crates on current workspace):
Implementation notes:
Spot-checked locally: The prior |
VM mode added + full 6.16.1 VM runAdded CLIInstall runs on the host (cargo is not in the VM image); the host then Full-suite result (scx 1.1.1, kernel 6.16.1, 60s per scheduler)
Tally: 12 PASS · 1 KNOWN_FRAGILE · 1 FAIL (scx_rustland) · 1 ERROR (scx_cake install)
Followups
Branch: |
de03d08 to
d1ad12f
Compare
Two-phase scx scheduler release-validation harness, replacing the prior
scx_smoke_test.sh single-shot script with a Python tool that separates
manifest discovery from the install+run pipeline so users can inspect or
hand-edit what gets tested.
Phase 1 (discover)
- Walks the scx workspace Cargo.toml for scheduler crates under
scheds/rust/ and scheds/experimental/ (auto-discovers; nothing
hard-coded). 16 crates as of 1.1.1, including scx_beerland and
scx_cake which post-date the .sh script's hand-curated list.
- Cross-checks each crate against the crates.io API to record the
latest stable version actually published. Crates with no published
release (e.g. scx_mitosis as of 1.1.1) appear in the manifest with
"published": false and are skipped by Phase 2 unless explicitly
whitelisted via --schedulers.
- Layers in per-crate metadata: extra runtime args (e.g. scx_layered
--run-example flag), known-fragile classification (e.g.
scx_rlfifo), and fallback versions (e.g. scx_cosmos 1.1.1 ->
1.1.2 packaging-bug workaround).
- Emits a JSON manifest (schema_version: 1). The manifest is the
unit of input/output for Phase 2 and is committable alongside
results for full provenance.
Phase 2 (run)
- Reads a manifest, installs each crate via cargo install --locked
--version (with fallback retry), runs the binary under sudo
timeout for the per-crate duration, classifies as
PASS / FAIL / KNOWN_FRAGILE / ERROR, and writes per-crate
stdout/stderr/install logs plus SUMMARY.tsv into a timestamped
output dir. Copies the effective manifest into the run dir for
provenance.
- Same host-friendly invariants as the .sh: sequential runs;
pre-flight verifies sched_ext is 'disabled', passwordless sudo
available, cargo present; SIGINT/SIGTERM/EXIT trap detaches any
leftover scheduler so Ctrl+C never leaves the host degraded.
CLI
scx_smoke_test.py discover [--out FILE|-] [--version VER]
[--scx-root DIR] [--no-network]
scx_smoke_test.py run [--manifest FILE] [--duration SEC]
[--out-dir DIR] [--schedulers "a b c"]
[--version VER] [--no-fallback]
scx_smoke_test.py list [--manifest FILE]
Typical flows
# default flow: discover from workspace + crates.io, then run.
./scx_smoke_test.py run
# inspect first, hand-edit, then run.
./scx_smoke_test.py discover -o m.json
$EDITOR m.json # drop crates, override versions, tweak args
./scx_smoke_test.py run --manifest m.json --duration 300
# short smoke (single scheduler).
./scx_smoke_test.py run --manifest m.json --schedulers scx_rusty \
--duration 30 --out-dir /tmp/smoke
Implementation notes
- Standard library only (urllib for crates.io; no requests
dependency). Works on Python 3.10+.
- Manifest discovery is tolerant: a missing scx_mitosis row in
crates.io produces published=false rather than aborting.
- Per-crate fallback only fires when the manifest's primary
version cargo-install fails; --no-fallback disables it.
- Classification mirrors the .sh: timeout-rc 124/137/143 with
runtime >= dur-5 = PASS; non-zero exit or early exit = FAIL;
stderr panic / BPF-load-error regex demotes PASS to FAIL;
KNOWN_FRAGILE crates soft-pass any FAIL.
…el testing Adds --vm, --kernel, --vng-arg, --in-vm, --bin-dir flags so that the smoke test can boot a vng VM with a user-specified kernel image and run the full scheduler suite inside it. Install still happens on the host (cargo is not available inside the VM); the host re-execs this script with --in-vm under vng --run, sharing the out-dir via --rwdir. Pre-install errors written by the host are preserved across the VM boundary; in-VM run rows are merged back via SUMMARY.tsv reload after vng exits. Usage: scx_smoke_test.py run --vm --kernel /boot/vmlinuz-6.16.x ... This lets PR validators boot a kernel that differs from the host (e.g. the 6.16 kernel needed by scx_lavd / scx_tickless when the host is 6.9) without reboot or hardware reprovisioning.
d1ad12f to
612b36b
Compare
Summary
Adds
scripts/scx_smoke_test.py, a two-phase release-validation harnessfor the scx scheduler crates published to crates.io.
Cargo.tomlfor cratesunder
scheds/rust/andscheds/experimental/(auto-discovery, nohard-coded list), cross-checks each against the crates.io API for the
latest stable version, layers in per-crate metadata (extra runtime
args, known-fragile classification, packaging-bug fallback versions),
and emits a JSON manifest (
schema_version: 1) that is the unit ofinput/output for Phase 2 and is committable alongside results for
full provenance.
cargo install --locked --version <ver> <crate>(with optionalfallback-version retry),
sudo-runs the binary for the configuredduration, classifies as PASS / FAIL / KNOWN_FRAGILE / ERROR, writes
per-crate stdout/stderr/install logs +
SUMMARY.tsvinto atimestamped output directory. The effective manifest is copied into
the run directory.
The intent is a release gate: every new crates.io release of scx
(1.1.1, 1.1.2, 1.2.0, ...) can be smoke-tested on a representative
host before announcement, and users can inspect and hand-edit the
manifest (override versions, drop crates, tweak args) between Phase 1
and Phase 2.
Usage
Discovered crate set (16 as of 1.1.1)
scx_beerland,scx_bpfland,scx_cake,scx_chaos,scx_cosmos(1.1.2 — 1.1.1 packaging bug fallback),
scx_flash,scx_lavd,scx_layered(default layer spec underscx_smoke_test_data/),scx_p2dq,scx_pandemonium(5.9.1 — independent versioning),scx_rustland,scx_rusty,scx_tickless,scx_flow(2.2.5 —independent),
scx_rlfifo(KNOWN_FRAGILE — userspace FIFO demo,30s soft-pass window),
scx_mitosis(published: false— skippedby Phase 2 unless explicitly whitelisted via
--schedulers).Mechanics
For each scheduler in the (filtered) manifest:
cargo install --locked --version <ver> <crate>; on failure, retrythe per-crate fallback version (e.g.
scx_cosmos 1.1.1 -> 1.1.2)unless
--no-fallbackis passed. Skip install if already present atan acceptable version.
/sys/kernel/sched_ext/state == 'disabled'between runs;only one sched_ext scheduler can attach at a time, so the loop is
strictly sequential.
sudo timeout --signal=TERM --kill-after=10 <dur> <bin> <extra_args>.rc=124SIGTERM-on-timeoutwith
runtime >= dur-5).timeout, or stderrmatches
BPF program load failed|panicked at|FATAL|libbpf:.*error.a watchdog trip during the window is expected; FAIL outcomes are
softened to KNOWN_FRAGILE (does not count toward release-blocking
totals).
Ctrl+C never leaves the host degraded.
Pre-flight requirements
CONFIG_SCHED_CLASS_EXT=y(>=6.12); script checks/sys/kernel/sched_ext/stateand refuses to start if missing oralready active.
sudo(schedulers must run as root).cargoonPATH.requestsdependency.Manifest schema (v1, abbreviated)
{ "schema_version": 1, "generated": "2026-05-26T20:37:16+00:00", "scx_root": "/path/to/scx", "default_duration_s": 300, "default_fragile_duration_s": 30, "crates": [ { "name": "scx_lavd", "version": "1.1.1", "type": "scheduler", // or "fragile" "published": true, // false => skipped unless whitelisted "fallback_version": null, // or e.g. "1.1.2" "extra_args": [], // e.g. ["file:layered_default.json"] "fragile_reason": null, "fragile_duration_s": null, "source_path": "scheds/rust/scx_lavd", "notes": null } // ... 15 more ] }Exit code
0if every non-KNOWN_FRAGILE scheduler PASSed,1otherwise.KNOWN_FRAGILE is opt-in soft-pass and never blocks the release gate.