feat(core): pluggable scheduling policies with critical-path analysis and advisory mode by m2papierz · Pull Request #37 · pirxware/pirx

m2papierz · 2026-06-21T12:41:56Z

What

Pluggable scheduling policy framework for the DES engine — replaces the hardcoded FIFO dispatch with a trait-based system, adds critical-path scheduling, decision recording, and an advisory mode for external event-driven simulation.

Why

The engine previously dispatched magic states to stalled gates in insertion order, which is fine for small circuits but leaves performance on the table for deep circuits with uneven critical paths. Profiling scheduling decisions was also impossible - the engine applied them silently with no observability.

Advisory mode is needed to drive the engine from external hardware telemetry or test harnesses without reimplementing the scheduling logic.

How

SchedulingPolicy trait (select_stalled_gate, admit_gate, should_restart_factory) - the engine calls the policy at each decision point. Two implementations: FifoPolicy (preserves existing behavior) and CriticalPathPolicy (prioritizes gates with the longest remaining DAG path).
Enum dispatch via SchedulingPolicyKind - zero vtable overhead, inlineable, Send for parallel sweeps. Same pattern as RoutingKind and BufferKind.
BufferModel trait extracted from the concrete counter buffer, with enum dispatch (BufferKind) for future slotted/priority buffer variants.
DecisionRecorder trait with NoOpDecisionRecorder (compiles to nothing via monomorphization) and FullDecisionRecorder. The engine is generic over D: DecisionRecorder, so recording is zero-cost when disabled.
DAG scheduling_weight: backward BFS from sinks computes longest-path weight per node. Stored in SecondaryMap,
computed once after DAG construction.
Advisory mode (Engine::observe / Engine::step_to / Engine::snapshot): external events (factory produced/failed, injection outcomes, measurement results) injected at specific cycles. The engine returns TimedDecision records instead of driving its own factory schedule.
RoutingModel trait extended with ActiveRoute and route_completed for contention-aware routing (scaffold - no contention model yet).
IR extensions: priority and deadline_cycles fields on Operation for future deadline-aware policies.
Policy selection wired through EngineConfig, SimulationConfig, CLI (--policy), WASM, and Python bindings.

Testing

make ci passes locally (fmt + clippy + test + audit)
New behavior has tests
Hot-path changes have criterion benchmarks

Checklist

PR description explains why, not just what
No new unwrap()/expect() in production code
No new allocations in the simulation hot loop
Crate boundaries respected (pirx-core never imports from pirx-adapters)
New dependencies justified (not "it's popular" — what does it replace?)

…policies Introduce the SchedulingPolicy trait, enum dispatch (SchedulingPolicyKind), and opt-in decision recording (DecisionRecorder) for FTQC gate scheduling. Purely additive — no engine integration yet.

Extend the profiler IR with per-operation scheduling hints: - `priority: i16` (default 0) — higher values signal more urgency - `deadline_cycles: Option<u64>` — optional hard deadline in QEC cycles Both fields use serde defaults for backward-compatible deserialization. Adapters, testkit, and all tests updated to supply the new fields.

Add `scheduling_weight: u32` to OpData, computed from IR priority during DAG construction (i16 biased to unsigned range). New `Dag::apply_critical_path_weights()` performs reverse-topological traversal to compute remaining-path-length per op, upgrading weights where the path length exceeds the priority-based value. Fixup nodes inherit their parent gate's weight to preserve scheduling urgency.

Wire SchedulingPolicyKind through EngineCore dispatch: policy-driven gate admission, stalled-gate selection, and factory restart decisions. Replace VecDeque with Vec for policy-indexed stalled gate access. Add DecisionRecorder threading for optional scheduling decision capture. Monomorphize run() over EventSink × DecisionSink cross-product.

Verify critical-path policy outperforms FIFO under contention, scheduling_weight propagation through DAG chains, fixup node weight inheritance, and decision recording stability. Includes multi-seed statistical validation with stochastic factories.

Replace concrete MagicStateBuffer with a BufferModel trait and CounterBuffer implementation behind BufferKind enum dispatch for zero-vtable-overhead extensibility.

Engine can now operate in advisory mode where factory events arrive from external sources (hardware telemetry, test harnesses) via observe()/step_to() instead of being generated internally. Shares 100% of the scheduling code path with simulation mode — only factory restart behavior differs. - ExternalEvent enum: FactoryProduced, FactoryFailed, InjectionOutcome, MeasurementResult, ProgramAbort - EngineSnapshot for serializable state inspection - EngineMode enum with mode-gated factory restart in dispatch loop - Advisory overrides for injection/measurement outcomes (zero-cost in simulation mode via Option<Box<AdvisoryOverrides>>) - PartialEq on SchedulingDecision/TimedDecision for equivalence testing - 12 integration tests including critical advisory/simulation equivalence

Add Serialize/Deserialize/FromStr/Copy to SchedulingPolicyKind so it can be embedded in MonteCarloConfig JSON and passed by value. Thread policy field into EngineConfig and MonteCarloConfig. Wire decision recording into all EngineResult variants with decisions() accessor. Update all existing call sites and tests to explicitly pass FIFO.

…tends Add --policy (fifo|critical-path) and --decisions flags to CLI profile and monte-carlo commands. Add profile_with_policy/monte_carlo_with_policy to WASM API. Add policy and decisions kwargs to Python profile/trace/ monte_carlo functions with PySchedulingDecision wrapper type.

…ffold Extend RoutingModel to accept cycle and active routes for stateful contention tracking. Add ActiveRoute type and route_completed callback. Introduce SlottedBuffer behind slotted-buffer feature gate.

codspeed-hq · 2026-06-21T12:44:28Z

Merging this PR will degrade performance by 13.85%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 12 regressed benchmarks
✅ 15 untouched benchmarks
🆕 4 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`sampled_4[100]`	173.2 µs	211.8 µs	-18.24%
❌	`run[1000gates_2000qubits]`	922.8 µs	1,121.3 µs	-17.71%
❌	`run[2000gates_4000qubits]`	1.8 ms	2.2 ms	-17.03%
❌	`run[500gates_1000qubits]`	477 µs	562.2 µs	-15.16%
❌	`run[100gates_200qubits]`	112.6 µs	132.5 µs	-15.05%
❌	`engine_run[500]`	961.6 µs	1,114 µs	-13.68%
❌	`engine_run[10]`	44.3 µs	50.5 µs	-12.25%
❌	`sampled_4[2000]`	3.4 ms	3.8 ms	-12.21%
❌	`streaming[100]`	363.1 µs	410.6 µs	-11.56%
❌	`engine_run[100]`	208.5 µs	235 µs	-11.25%
❌	`full[2000]`	3.8 ms	4.3 ms	-11.08%
❌	`full[500]`	991.4 µs	1,108.2 µs	-10.54%
🆕	`run[1000gates_2000qubits]`	N/A	1.9 ms	N/A
🆕	`run[500gates_1000qubits]`	N/A	773.3 µs	N/A
🆕	`critical_path`	N/A	1.2 ms	N/A
🆕	`lookahead`	N/A	1.4 ms	N/A

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing feat/scheduler-core (c7485ee) with master (7bffc5c)}

…ing scaffolding - Delete SlottedBuffer placeholder and its slotted-buffer feature gate - Remove ActiveRoute struct, route_completed trait method, and active_routes parameter from RoutingModel::latency - BufferKind now only has Counter variant (single-variant enum kept for future extensibility without API churn)

- Restructure stalled gates from flat Vec to Vec<Vec<StalledGate>> indexed by pool — eliminates O(n) cross-pool scanning in hot loop - Separate priority (i16) from scheduling_weight (u32) in OpData; critical-path weights now overwrite instead of max with priority, fixing CriticalPath policy being functionally inert - Narrow pool_idx from usize to u16 throughout scheduling types - Replace 6-parameter should_restart_factory with FactoryRestartContext - Delete trivial run_with_decisions! macro, inline direct calls - Convert observe() from assert-panic to Result<(), EngineError> with ObserveRequiresAdvisory variant

…g tests - Add 4 CriticalPath policy proptests: determinism, all-gates-complete, monotonic traces, dual-pool determinism (10k seeds each) - Update observe_panics_in_simulation_mode to assert Result::Err instead of #[should_panic] to match new observe() signature - Update advisory tests for Result-returning observe() - Update scheduling_policy tests for FactoryRestartContext and separated weight/priority semantics

priority: i16 was stored on OpData and preserved through DAG construction and fixup injection, but never read by any engine or scheduling code. Removes 2 bytes/node of dead weight from the hot-path struct. The IR Operation::priority field remains for future deadline-aware policies; it will be re-added to OpData and wired into StalledGate when a policy consumes it. - Remove priority from OpData, from_circuit, inject_fixup - Add clone justification comment on fixup qubit clone - Remove dag_build_preserves_priority test (tested dead field) - Clean stale priority assertions from weight tests

FIFO policy always removes index 0 (front). Vec::remove(0) is O(n) — shifts all elements. VecDeque gives O(1) front removal. CriticalPath arbitrary-index removal is O(min(idx, len-idx)), same or better than Vec. - Change stalled_per_pool from Vec<Vec<StalledGate>> to Vec<VecDeque<StalledGate>> - Use make_contiguous() before passing slice to policy - VecDeque::remove returns Option — graceful instead of panicking - push → push_back, first → front, Vec::is_empty → VecDeque::is_empty

Replace magic number 16 with state-proportional capacity hint for FullDecisionRecorder in advisory step_to. Uses stalled gate count + factory pool count (min 8) to size the allocation based on actual decisions the step can produce.

- Add EngineResult::into_profile_and_decisions() for zero-copy extraction of both profile and decisions in a single consume - Replace .to_vec() clone in Python binding with the new method - Pre-allocate AdvisoryOverrides HashMaps with capacity from circuit metadata - Remove stale superseded Trurlic decision

buffer.rs → buffer/{mod, counter, dispatch, model}.rs routing.rs → routing/{mod, dispatch, manhattan, model, scalar}.rs Separates concerns within each module: counter logic, dispatch helpers, model types, and routing strategies each get their own file. Public API unchanged.

Move DagDemandAnalysis, demand_analysis(), critical_path_t_density(), and static_t_ready_cycles() from dag/mod.rs into dag/demand.rs. Keeps the DAG struct focused on graph structure; demand analysis lives in its own file alongside the topological-sort implementation.

engine/mod.rs → engine/{core, build, run, sink}.rs - core: EngineCore struct, EngineMode, DecoderState, FactoryPool - build: construction helpers (hook table, decoder, factory pools) - run: simulation loop (run_loop, step_inner, process_decoder) - sink: EventSink enum + trace size estimation scheduling/mod.rs → scheduling/{types, policy, dispatch}.rs - types: StalledGate, AdmitDecision, FactoryRestartContext - policy: SchedulingPolicy trait - dispatch: SchedulingPolicyKind enum + serde impls Breaks the two largest files into focused submodules. Public API unchanged.

Introduce SchedulingContext, PoolState, and GateCandidate — structured snapshots of engine state for policy decisions. Add peek_pool_production on EventQueue and peek_magic_demand on FifoReadyQueue to gather per-pool factory activity and ready-set demand without allocation.

Add prepare() phase to SchedulingPolicy, called once per cycle before dispatch decisions. Replace scattered per-call arguments (gate_id, weight, buffer_occupancy, cycle) with GateCandidate struct. Remove cycle parameter from select_stalled_gate — policies access it via prepare().

…licies SchedulingPolicyKind and MonteCarloConfig lose Copy to accommodate policies with cached state (e.g. SmallVec forecasts). All downstream move sites gain .clone() — these are construction-time, not hot-loop. - policy_reason() now takes &SchedulingPolicyKind (no implicit copy) - Add PolicyReason::Lookahead variant for upcoming policy - Lookahead arm added to SchedulingPolicyKind enum dispatch (empty) - Test files updated for non-Copy MonteCarloConfig

Production-forecast-aware admission control that extends critical-path stall selection with buffer conservation. Uses prepare() to cache per-pool forecasts (SmallVec<[PoolForecast; 2]>, zero heap alloc). Admission rules: - Buffer ≤1 state + higher-weight stalled gates → Stall - Production imminent (≤3 cycles) → Dispatch freely - Production distant + buffer low + low-weight gate → Stall Wired into SchedulingPolicyKind enum dispatch, CLI (--policy lookahead), and Python bindings (policy="lookahead").

12 new tests covering: - Stall selection (highest weight, FIFO tie-break) - Admission control: holds state for higher priority gates - Admission control: dispatches freely when production imminent - Admission control: conserves when production distant - Engine integration: completes chain circuits - Outperforms CriticalPath on contention (deterministic + multi-seed) - Enum dispatch, string roundtrip, JSON serialization - Identical to CriticalPath without contention

Compute per-gate ASAP/ALAP scheduling slack via topological traversal, deriving slack_ratio (fraction of T-gates with zero slack) as a measure of scheduling flexibility. Add delta_max: worst-case cumulative demand minus supply deficit across time buckets. Thread both metrics through EngineResult, ProfileAnalyzer, MonteCarloSummary, and sensitivity OutputMetric. Add unit and integration tests for slack computation, delta_max bounds, and cross-mode consistency.

Add `routing_cost_cycles: Option<u32>` to Operation in pirx-ir, allowing compilers to provide pre-computed routing costs that bypass model estimation. Propagate through validation with serde default/skip_serializing_if. Add `Congestion` variant to RoutingConfig in pirx-hw with `cycles_per_hop` and `congestion_factor` fields. Validate cycles_per_hop > 0 and congestion_factor >= 0 with is_finite() guard. Add InvalidCongestionFactor error variant. Include Congestion in routing_physical_qubits estimation.

…oute_completed lifecycle CongestionRouting applies latency = manhattan_distance × cycles_per_hop × (1 + α × active_routes) with statistical congestion tracking via an active_count counter incremented on latency() and decremented on route_completed(). Propagate routing_cost_cycles from IR Operation to OpData as routing_cost_override. When present, total_gate_cost() uses the override directly without calling RoutingModel::latency() and without tracking the gate for route_completed() callbacks. Wire route_completed() into the engine: track model-estimated routed gates in SmallVec<[OpKey; 4]> on EngineCore, call route_completed() on GateCompleted for model-estimated routes only. Add route_completed() default no-op to RoutingModel trait. Forward both latency() and route_completed() through RoutingKind enum dispatch. Update RoutingModelInfo, ProfileAnalyzer, and DAG construction to handle the new Congestion variant and routing_cost_override field.

…ture Add routing_cost_cycles: None to all Operation construction sites across adapters, testkit fixtures, and integration tests. Add congestion_hw() fixture to pirx-testkit for congestion routing tests with zero injection error probability and preloaded buffer.

Make DAG adjacency maps, OpData.active, and OpData.scheduling_weight pub(crate) instead of pub — these are internal engine state that should not leak through the public API. Add is_active() and scheduling_weight() accessors for external consumers. Restrict initial_ready_set() to cfg(test) since only unit tests use it; the engine uses the iterator variant.

…lysis Break large functions into focused helpers: check_injection_error, dispatch_magic_gate, serve_one_stalled in engine dispatch; propagate_demand in DAG demand analysis; compute_asap_schedule and compute_alap_schedule in slack analysis. Simplify serve_stalled_for_pool by removing redundant pool_idx parameter. Clean up iterator patterns and formatting throughout.

…ror, and let-chains Add Copy to PoolState, SchedulingDecision, and TimedDecision — these are small value types that benefit from implicit copies over clones. Replace unit error type in SchedulingPolicyKind::from_str with a proper ParsePolicyError. Switch deserialization to a visitor pattern for correct str handling. Use let-chains in LookaheadPolicy for cleaner control flow. Tune SmallVec capacities from 2 to 1 to match typical single-pool usage. Add reserved-field documentation for future policy extensibility.

… bindings Add --policy flag to the CLI compare subcommand, passing the user's scheduling policy choice through to run_comparison. Change run_comparison to take &MonteCarloConfig (avoids unnecessary clone). Add default_congestion_factor (0.1) for RoutingConfig::Congestion. Update wasm and python bindings for ParsePolicyError and Copy-derived types.

…uting Add proptest properties for the LookaheadPolicy (determinism, all gates complete, monotonic traces, dual-pool determinism) and CongestionRouting (determinism, monotonic traces). Add integration test verifying routing_cost_cycles override bypasses the routing model. Add unit tests for congestion RoutingModelInfo and routing_cost_cycles IR default.

…urations Guard routed_gates tracking behind route_tracking flag — only congestion routing is stateful; Manhattan/Scalar route_completed() are no-ops. Skip build_scheduling_context() for FIFO and CriticalPath policies whose prepare() is a no-op, avoiding O(heap_size) scan per cycle. Skip apply_critical_path_weights() for FIFO policy which ignores weights, saving O(V+E) at simulation start. Add FIFO fast-path in serve_stalled_for_pool using pop_front() instead of make_contiguous() + policy dispatch + remove(idx).

…ngestion routing All existing benchmarks used FIFO policy and scalar/Manhattan routing, leaving CriticalPath, Lookahead, and congestion routing invisible to CodSpeed. Add scheduling_policy group (CriticalPath + Lookahead at 500 gates) and congestion_routing group (500 + 1000 gates) to catch regressions in policy-specific and stateful-routing code paths.

…implementations Extract decoder backpressure modeling into a standalone module with a trait-based design matching factory/buffer/routing patterns. Two implementations: ConstantThroughputDecoder (deterministic, no RNG) and MM1Decoder (stochastic M/M/1 queue with exponential service times). Enum dispatch via DecoderKind avoids vtable overhead.

Remove the monolithic DecoderState struct from engine/core.rs and wire the engine to use DecoderModel trait via DecoderKind enum dispatch. Separate measurements_per_qubit (QEC encoding property) from the decoder model. SchedulingContext now exposes a single DecoderPressure snapshot instead of separate pending/stalled fields. Engine run loop delegates tick/stall logic to the trait, eliminating inline decoder arithmetic.

Add load(&self) method to RoutingModel trait returning a zero-alloc RoutingLoad struct (active_routes: u32, congestion_level: f64). Default implementation returns zero for stateless models (Scalar, Manhattan). CongestionRouting overrides with live active_count and congestion_factor × active_count. RoutingKind enum delegates via the established match-dispatch pattern.

Add routing: RoutingLoad field to SchedulingContext, populated from self.routing.load() in build_scheduling_context(). Re-export RoutingLoad from crate root. Update existing SchedulingContext literals in tests. Add six new tests covering default zero load, congestion load tracking, enum delegation, and context integration.

Allow scheduling policies to hold buffer states instead of always serving a stalled gate. None signals the engine to break out of the dispatch loop, preserving the state for a higher-priority gate.

…ady gates Track buffer_occupancy in PoolForecast and return None from select_stalled_gate when buffer is critically low and a higher-weight gate is about to arrive in Phase 5.

… hold-back Cover always-Some guarantee for FIFO and CriticalPath, None-returning test policies (AlwaysHold, HoldOnce), and LookaheadPolicy hold/serve decisions based on buffer occupancy and weight comparison.

Fuse demand_analysis + compute_slack into single static_analysis (2 O(V+E) passes instead of 3, sharing the forward ASAP pass). Add defers_admission/defers_restart marker methods on SchedulingPolicy so the engine skips GateCandidate/FactoryRestartContext construction and policy dispatch for FIFO. Defer StalledGate construction to stall path only, guard trace_id lookup behind advisory_overrides check, and add FIFO pop_front fast-path in serve_stalled_for_pool.

m2papierz added 12 commits June 21, 2026 09:31

feat(core): add scheduling policy module with FIFO and critical-path …

49ef8ba

…policies Introduce the SchedulingPolicy trait, enum dispatch (SchedulingPolicyKind), and opt-in decision recording (DecisionRecorder) for FTQC gate scheduling. Purely additive — no engine integration yet.

refactor(core): extract BufferModel trait with enum dispatch

34f9f57

Replace concrete MagicStateBuffer with a BufferModel trait and CounterBuffer implementation behind BufferKind enum dispatch for zero-vtable-overhead extensibility.

chore: formatting and linting

e2fcced

feat(core): add contention-aware routing trait and slotted buffer sca…

4b9dbfc

…ffold Extend RoutingModel to accept cycle and active routes for stateful contention tracking. Add ActiveRoute type and route_completed callback. Introduce SlottedBuffer behind slotted-buffer feature gate.

chore: formatting and linting

a6b41ed

m2papierz added 17 commits June 21, 2026 15:01

m2papierz added 18 commits June 21, 2026 23:31

ci: retrigger PR checks

9cc1d5c

feat(core): make select_stalled_gate return Option<usize>

9d0de42

Allow scheduling policies to hold buffer states instead of always serving a stalled gate. None signals the engine to break out of the dispatch loop, preserving the state for a higher-priority gate.

feat(core): hold buffer state in LookaheadPolicy for higher-weight re…

0716020

…ady gates Track buffer_occupancy in PoolForecast and return None from select_stalled_gate when buffer is critically low and a higher-weight gate is about to arrive in Phase 5.

test(core): add tests for Option<usize> stall selection and lookahead…

15ab363

… hold-back Cover always-Some guarantee for FIFO and CriticalPath, None-returning test policies (AlwaysHold, HoldOnce), and LookaheadPolicy hold/serve decisions based on buffer occupancy and weight comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): pluggable scheduling policies with critical-path analysis and advisory mode#37

feat(core): pluggable scheduling policies with critical-path analysis and advisory mode#37
m2papierz wants to merge 47 commits into
masterfrom
feat/scheduler-core

m2papierz commented Jun 21, 2026

Uh oh!

codspeed-hq Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

m2papierz commented Jun 21, 2026

What

Why

How

Testing

Checklist

Uh oh!

codspeed-hq Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 13.85%

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented Jun 21, 2026 •

edited

Loading