Skip to content

feat(core): pluggable scheduling policies with critical-path analysis and advisory mode#37

Open
m2papierz wants to merge 47 commits into
masterfrom
feat/scheduler-core
Open

feat(core): pluggable scheduling policies with critical-path analysis and advisory mode#37
m2papierz wants to merge 47 commits into
masterfrom
feat/scheduler-core

Conversation

@m2papierz

Copy link
Copy Markdown
Contributor

What

Pluggable scheduling policy framework for the DES engine — replaces the hardcoded FIFO dispatch with a trait-based system, adds critical-path scheduling, decision recording, and an advisory mode for external event-driven simulation.

Why

The engine previously dispatched magic states to stalled gates in insertion order, which is fine for small circuits but leaves performance on the table for deep circuits with uneven critical paths. Profiling scheduling decisions was also impossible - the engine applied them silently with no observability.

Advisory mode is needed to drive the engine from external hardware telemetry or test harnesses without reimplementing the scheduling logic.

How

  • SchedulingPolicy trait (select_stalled_gate, admit_gate, should_restart_factory) - the engine calls the policy at each decision point. Two implementations: FifoPolicy (preserves existing behavior) and CriticalPathPolicy (prioritizes gates with the longest remaining DAG path).
  • Enum dispatch via SchedulingPolicyKind - zero vtable overhead, inlineable, Send for parallel sweeps. Same pattern as RoutingKind and BufferKind.
  • BufferModel trait extracted from the concrete counter buffer, with enum dispatch (BufferKind) for future slotted/priority buffer variants.
  • DecisionRecorder trait with NoOpDecisionRecorder (compiles to nothing via monomorphization) and FullDecisionRecorder. The engine is generic over D: DecisionRecorder, so recording is zero-cost when disabled.
  • DAG scheduling_weight: backward BFS from sinks computes longest-path weight per node. Stored in SecondaryMap,
    computed once after DAG construction.
  • Advisory mode (Engine::observe / Engine::step_to / Engine::snapshot): external events (factory produced/failed, injection outcomes, measurement results) injected at specific cycles. The engine returns TimedDecision records instead of driving its own factory schedule.
  • RoutingModel trait extended with ActiveRoute and route_completed for contention-aware routing (scaffold - no contention model yet).
  • IR extensions: priority and deadline_cycles fields on Operation for future deadline-aware policies.
  • Policy selection wired through EngineConfig, SimulationConfig, CLI (--policy), WASM, and Python bindings.

Testing

  • make ci passes locally (fmt + clippy + test + audit)
  • New behavior has tests
  • Hot-path changes have criterion benchmarks

Checklist

  • PR description explains why, not just what
  • No new unwrap()/expect() in production code
  • No new allocations in the simulation hot loop
  • Crate boundaries respected (pirx-core never imports from pirx-adapters)
  • New dependencies justified (not "it's popular" — what does it replace?)

m2papierz added 12 commits June 21, 2026 09:31
…policies

Introduce the SchedulingPolicy trait, enum dispatch (SchedulingPolicyKind),
and opt-in decision recording (DecisionRecorder) for FTQC gate scheduling.
Purely additive — no engine integration yet.
Extend the profiler IR with per-operation scheduling hints:
- `priority: i16` (default 0) — higher values signal more urgency
- `deadline_cycles: Option<u64>` — optional hard deadline in QEC cycles

Both fields use serde defaults for backward-compatible deserialization.
Adapters, testkit, and all tests updated to supply the new fields.
Add `scheduling_weight: u32` to OpData, computed from IR priority during
DAG construction (i16 biased to unsigned range).

New `Dag::apply_critical_path_weights()` performs reverse-topological
traversal to compute remaining-path-length per op, upgrading weights
where the path length exceeds the priority-based value. Fixup nodes
inherit their parent gate's weight to preserve scheduling urgency.
Wire SchedulingPolicyKind through EngineCore dispatch: policy-driven
gate admission, stalled-gate selection, and factory restart decisions.
Replace VecDeque with Vec for policy-indexed stalled gate access.
Add DecisionRecorder threading for optional scheduling decision capture.
Monomorphize run() over EventSink × DecisionSink cross-product.
Verify critical-path policy outperforms FIFO under contention,
scheduling_weight propagation through DAG chains, fixup node
weight inheritance, and decision recording stability. Includes
multi-seed statistical validation with stochastic factories.
Replace concrete MagicStateBuffer with a BufferModel trait and
CounterBuffer implementation behind BufferKind enum dispatch for
zero-vtable-overhead extensibility.
Engine can now operate in advisory mode where factory events arrive from
external sources (hardware telemetry, test harnesses) via observe()/step_to()
instead of being generated internally. Shares 100% of the scheduling code
path with simulation mode — only factory restart behavior differs.

- ExternalEvent enum: FactoryProduced, FactoryFailed, InjectionOutcome,
  MeasurementResult, ProgramAbort
- EngineSnapshot for serializable state inspection
- EngineMode enum with mode-gated factory restart in dispatch loop
- Advisory overrides for injection/measurement outcomes (zero-cost in
  simulation mode via Option<Box<AdvisoryOverrides>>)
- PartialEq on SchedulingDecision/TimedDecision for equivalence testing
- 12 integration tests including critical advisory/simulation equivalence
Add Serialize/Deserialize/FromStr/Copy to SchedulingPolicyKind so it
can be embedded in MonteCarloConfig JSON and passed by value. Thread
policy field into EngineConfig and MonteCarloConfig. Wire decision
recording into all EngineResult variants with decisions() accessor.
Update all existing call sites and tests to explicitly pass FIFO.
…tends

Add --policy (fifo|critical-path) and --decisions flags to CLI profile
and monte-carlo commands. Add profile_with_policy/monte_carlo_with_policy
to WASM API. Add policy and decisions kwargs to Python profile/trace/
monte_carlo functions with PySchedulingDecision wrapper type.
…ffold

Extend RoutingModel to accept cycle and active routes for stateful
contention tracking. Add ActiveRoute type and route_completed callback.
Introduce SlottedBuffer behind slotted-buffer feature gate.
@codspeed-hq

codspeed-hq Bot commented Jun 21, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 13.85%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 12 regressed benchmarks
✅ 15 untouched benchmarks
🆕 4 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
sampled_4[100] 173.2 µs 211.8 µs -18.24%
run[1000gates_2000qubits] 922.8 µs 1,121.3 µs -17.71%
run[2000gates_4000qubits] 1.8 ms 2.2 ms -17.03%
run[500gates_1000qubits] 477 µs 562.2 µs -15.16%
run[100gates_200qubits] 112.6 µs 132.5 µs -15.05%
engine_run[500] 961.6 µs 1,114 µs -13.68%
engine_run[10] 44.3 µs 50.5 µs -12.25%
sampled_4[2000] 3.4 ms 3.8 ms -12.21%
streaming[100] 363.1 µs 410.6 µs -11.56%
engine_run[100] 208.5 µs 235 µs -11.25%
full[2000] 3.8 ms 4.3 ms -11.08%
full[500] 991.4 µs 1,108.2 µs -10.54%
🆕 run[1000gates_2000qubits] N/A 1.9 ms N/A
🆕 run[500gates_1000qubits] N/A 773.3 µs N/A
🆕 critical_path N/A 1.2 ms N/A
🆕 lookahead N/A 1.4 ms N/A

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing feat/scheduler-core (c7485ee) with master (7bffc5c)

Open in CodSpeed

m2papierz added 17 commits June 21, 2026 15:01
…ing scaffolding

- Delete SlottedBuffer placeholder and its slotted-buffer feature gate
- Remove ActiveRoute struct, route_completed trait method, and
  active_routes parameter from RoutingModel::latency
- BufferKind now only has Counter variant (single-variant enum kept
  for future extensibility without API churn)
- Restructure stalled gates from flat Vec to Vec<Vec<StalledGate>>
  indexed by pool — eliminates O(n) cross-pool scanning in hot loop
- Separate priority (i16) from scheduling_weight (u32) in OpData;
  critical-path weights now overwrite instead of max with priority,
  fixing CriticalPath policy being functionally inert
- Narrow pool_idx from usize to u16 throughout scheduling types
- Replace 6-parameter should_restart_factory with FactoryRestartContext
- Delete trivial run_with_decisions! macro, inline direct calls
- Convert observe() from assert-panic to Result<(), EngineError>
  with ObserveRequiresAdvisory variant
…g tests

- Add 4 CriticalPath policy proptests: determinism, all-gates-complete,
  monotonic traces, dual-pool determinism (10k seeds each)
- Update observe_panics_in_simulation_mode to assert Result::Err
  instead of #[should_panic] to match new observe() signature
- Update advisory tests for Result-returning observe()
- Update scheduling_policy tests for FactoryRestartContext and
  separated weight/priority semantics
priority: i16 was stored on OpData and preserved through DAG
construction and fixup injection, but never read by any engine
or scheduling code. Removes 2 bytes/node of dead weight from
the hot-path struct. The IR Operation::priority field remains
for future deadline-aware policies; it will be re-added to
OpData and wired into StalledGate when a policy consumes it.

- Remove priority from OpData, from_circuit, inject_fixup
- Add clone justification comment on fixup qubit clone
- Remove dag_build_preserves_priority test (tested dead field)
- Clean stale priority assertions from weight tests
FIFO policy always removes index 0 (front). Vec::remove(0) is O(n)
— shifts all elements. VecDeque gives O(1) front removal.
CriticalPath arbitrary-index removal is O(min(idx, len-idx)),
same or better than Vec.

- Change stalled_per_pool from Vec<Vec<StalledGate>> to Vec<VecDeque<StalledGate>>
- Use make_contiguous() before passing slice to policy
- VecDeque::remove returns Option — graceful instead of panicking
- push → push_back, first → front, Vec::is_empty → VecDeque::is_empty
Replace magic number 16 with state-proportional capacity hint
for FullDecisionRecorder in advisory step_to. Uses stalled gate
count + factory pool count (min 8) to size the allocation based
on actual decisions the step can produce.
- Add EngineResult::into_profile_and_decisions() for zero-copy extraction
  of both profile and decisions in a single consume
- Replace .to_vec() clone in Python binding with the new method
- Pre-allocate AdvisoryOverrides HashMaps with capacity from circuit metadata
- Remove stale superseded Trurlic decision
buffer.rs → buffer/{mod, counter, dispatch, model}.rs
routing.rs → routing/{mod, dispatch, manhattan, model, scalar}.rs

Separates concerns within each module: counter logic, dispatch helpers,
model types, and routing strategies each get their own file. Public API
unchanged.
Move DagDemandAnalysis, demand_analysis(), critical_path_t_density(),
and static_t_ready_cycles() from dag/mod.rs into dag/demand.rs.

Keeps the DAG struct focused on graph structure; demand analysis lives
in its own file alongside the topological-sort implementation.
engine/mod.rs → engine/{core, build, run, sink}.rs
  - core: EngineCore struct, EngineMode, DecoderState, FactoryPool
  - build: construction helpers (hook table, decoder, factory pools)
  - run: simulation loop (run_loop, step_inner, process_decoder)
  - sink: EventSink enum + trace size estimation

scheduling/mod.rs → scheduling/{types, policy, dispatch}.rs
  - types: StalledGate, AdmitDecision, FactoryRestartContext
  - policy: SchedulingPolicy trait
  - dispatch: SchedulingPolicyKind enum + serde impls

Breaks the two largest files into focused submodules. Public API unchanged.
Introduce SchedulingContext, PoolState, and GateCandidate — structured
snapshots of engine state for policy decisions. Add peek_pool_production
on EventQueue and peek_magic_demand on FifoReadyQueue to gather per-pool
factory activity and ready-set demand without allocation.
Add prepare() phase to SchedulingPolicy, called once per cycle before
dispatch decisions. Replace scattered per-call arguments (gate_id,
weight, buffer_occupancy, cycle) with GateCandidate struct. Remove cycle
parameter from select_stalled_gate — policies access it via prepare().
…licies

SchedulingPolicyKind and MonteCarloConfig lose Copy to accommodate
policies with cached state (e.g. SmallVec forecasts). All downstream
move sites gain .clone() — these are construction-time, not hot-loop.

- policy_reason() now takes &SchedulingPolicyKind (no implicit copy)
- Add PolicyReason::Lookahead variant for upcoming policy
- Lookahead arm added to SchedulingPolicyKind enum dispatch (empty)
- Test files updated for non-Copy MonteCarloConfig
Production-forecast-aware admission control that extends critical-path
stall selection with buffer conservation. Uses prepare() to cache
per-pool forecasts (SmallVec<[PoolForecast; 2]>, zero heap alloc).

Admission rules:
- Buffer ≤1 state + higher-weight stalled gates → Stall
- Production imminent (≤3 cycles) → Dispatch freely
- Production distant + buffer low + low-weight gate → Stall

Wired into SchedulingPolicyKind enum dispatch, CLI (--policy lookahead),
and Python bindings (policy="lookahead").
12 new tests covering:
- Stall selection (highest weight, FIFO tie-break)
- Admission control: holds state for higher priority gates
- Admission control: dispatches freely when production imminent
- Admission control: conserves when production distant
- Engine integration: completes chain circuits
- Outperforms CriticalPath on contention (deterministic + multi-seed)
- Enum dispatch, string roundtrip, JSON serialization
- Identical to CriticalPath without contention
Compute per-gate ASAP/ALAP scheduling slack via topological traversal,
deriving slack_ratio (fraction of T-gates with zero slack) as a measure
of scheduling flexibility. Add delta_max: worst-case cumulative demand
minus supply deficit across time buckets. Thread both metrics through
EngineResult, ProfileAnalyzer, MonteCarloSummary, and sensitivity
OutputMetric. Add unit and integration tests for slack computation,
delta_max bounds, and cross-mode consistency.
Add `routing_cost_cycles: Option<u32>` to Operation in pirx-ir, allowing
compilers to provide pre-computed routing costs that bypass model estimation.
Propagate through validation with serde default/skip_serializing_if.

Add `Congestion` variant to RoutingConfig in pirx-hw with `cycles_per_hop`
and `congestion_factor` fields. Validate cycles_per_hop > 0 and
congestion_factor >= 0 with is_finite() guard. Add InvalidCongestionFactor
error variant. Include Congestion in routing_physical_qubits estimation.
m2papierz added 18 commits June 21, 2026 23:31
…oute_completed lifecycle

CongestionRouting applies latency = manhattan_distance × cycles_per_hop ×
(1 + α × active_routes) with statistical congestion tracking via an
active_count counter incremented on latency() and decremented on
route_completed().

Propagate routing_cost_cycles from IR Operation to OpData as
routing_cost_override. When present, total_gate_cost() uses the override
directly without calling RoutingModel::latency() and without tracking
the gate for route_completed() callbacks.

Wire route_completed() into the engine: track model-estimated routed
gates in SmallVec<[OpKey; 4]> on EngineCore, call route_completed() on
GateCompleted for model-estimated routes only. Add route_completed()
default no-op to RoutingModel trait. Forward both latency() and
route_completed() through RoutingKind enum dispatch.

Update RoutingModelInfo, ProfileAnalyzer, and DAG construction to handle
the new Congestion variant and routing_cost_override field.
…ture

Add routing_cost_cycles: None to all Operation construction sites across
adapters, testkit fixtures, and integration tests. Add congestion_hw()
fixture to pirx-testkit for congestion routing tests with zero injection
error probability and preloaded buffer.
Make DAG adjacency maps, OpData.active, and OpData.scheduling_weight
pub(crate) instead of pub — these are internal engine state that should
not leak through the public API. Add is_active() and scheduling_weight()
accessors for external consumers. Restrict initial_ready_set() to
cfg(test) since only unit tests use it; the engine uses the iterator
variant.
…lysis

Break large functions into focused helpers: check_injection_error,
dispatch_magic_gate, serve_one_stalled in engine dispatch; propagate_demand
in DAG demand analysis; compute_asap_schedule and compute_alap_schedule in
slack analysis. Simplify serve_stalled_for_pool by removing redundant
pool_idx parameter. Clean up iterator patterns and formatting throughout.
…ror, and let-chains

Add Copy to PoolState, SchedulingDecision, and TimedDecision — these are
small value types that benefit from implicit copies over clones. Replace
unit error type in SchedulingPolicyKind::from_str with a proper
ParsePolicyError. Switch deserialization to a visitor pattern for correct
str handling. Use let-chains in LookaheadPolicy for cleaner control flow.
Tune SmallVec capacities from 2 to 1 to match typical single-pool usage.
Add reserved-field documentation for future policy extensibility.
… bindings

Add --policy flag to the CLI compare subcommand, passing the user's
scheduling policy choice through to run_comparison. Change
run_comparison to take &MonteCarloConfig (avoids unnecessary clone). Add
default_congestion_factor (0.1) for RoutingConfig::Congestion. Update
wasm and python bindings for ParsePolicyError and Copy-derived types.
…uting

Add proptest properties for the LookaheadPolicy (determinism, all gates
complete, monotonic traces, dual-pool determinism) and CongestionRouting
(determinism, monotonic traces). Add integration test verifying
routing_cost_cycles override bypasses the routing model. Add unit tests
for congestion RoutingModelInfo and routing_cost_cycles IR default.
…urations

Guard routed_gates tracking behind route_tracking flag — only congestion
routing is stateful; Manhattan/Scalar route_completed() are no-ops.
Skip build_scheduling_context() for FIFO and CriticalPath policies whose
prepare() is a no-op, avoiding O(heap_size) scan per cycle. Skip
apply_critical_path_weights() for FIFO policy which ignores weights,
saving O(V+E) at simulation start. Add FIFO fast-path in
serve_stalled_for_pool using pop_front() instead of make_contiguous()
+ policy dispatch + remove(idx).
…ngestion routing

All existing benchmarks used FIFO policy and scalar/Manhattan routing,
leaving CriticalPath, Lookahead, and congestion routing invisible to
CodSpeed. Add scheduling_policy group (CriticalPath + Lookahead at 500
gates) and congestion_routing group (500 + 1000 gates) to catch
regressions in policy-specific and stateful-routing code paths.
…implementations

Extract decoder backpressure modeling into a standalone module with a
trait-based design matching factory/buffer/routing patterns. Two
implementations: ConstantThroughputDecoder (deterministic, no RNG) and
MM1Decoder (stochastic M/M/1 queue with exponential service times).
Enum dispatch via DecoderKind avoids vtable overhead.
Remove the monolithic DecoderState struct from engine/core.rs and wire
the engine to use DecoderModel trait via DecoderKind enum dispatch.
Separate measurements_per_qubit (QEC encoding property) from the decoder
model. SchedulingContext now exposes a single DecoderPressure snapshot
instead of separate pending/stalled fields. Engine run loop delegates
tick/stall logic to the trait, eliminating inline decoder arithmetic.
Add load(&self) method to RoutingModel trait returning a zero-alloc
RoutingLoad struct (active_routes: u32, congestion_level: f64).
Default implementation returns zero for stateless models (Scalar,
Manhattan). CongestionRouting overrides with live active_count and
congestion_factor × active_count. RoutingKind enum delegates via
the established match-dispatch pattern.
Add routing: RoutingLoad field to SchedulingContext, populated from
self.routing.load() in build_scheduling_context(). Re-export
RoutingLoad from crate root. Update existing SchedulingContext
literals in tests. Add six new tests covering default zero load,
congestion load tracking, enum delegation, and context integration.
Allow scheduling policies to hold buffer states instead of always
serving a stalled gate. None signals the engine to break out of the
dispatch loop, preserving the state for a higher-priority gate.
…ady gates

Track buffer_occupancy in PoolForecast and return None from
select_stalled_gate when buffer is critically low and a higher-weight
gate is about to arrive in Phase 5.
… hold-back

Cover always-Some guarantee for FIFO and CriticalPath, None-returning
test policies (AlwaysHold, HoldOnce), and LookaheadPolicy hold/serve
decisions based on buffer occupancy and weight comparison.
Fuse demand_analysis + compute_slack into single static_analysis (2
O(V+E) passes instead of 3, sharing the forward ASAP pass). Add
defers_admission/defers_restart marker methods on SchedulingPolicy so
the engine skips GateCandidate/FactoryRestartContext construction and
policy dispatch for FIFO. Defer StalledGate construction to stall path
only, guard trace_id lookup behind advisory_overrides check, and add
FIFO pop_front fast-path in serve_stalled_for_pool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant