feat(core): pluggable scheduling policies with critical-path analysis and advisory mode#37
Open
m2papierz wants to merge 47 commits into
Open
feat(core): pluggable scheduling policies with critical-path analysis and advisory mode#37m2papierz wants to merge 47 commits into
m2papierz wants to merge 47 commits into
Conversation
…policies Introduce the SchedulingPolicy trait, enum dispatch (SchedulingPolicyKind), and opt-in decision recording (DecisionRecorder) for FTQC gate scheduling. Purely additive — no engine integration yet.
Extend the profiler IR with per-operation scheduling hints: - `priority: i16` (default 0) — higher values signal more urgency - `deadline_cycles: Option<u64>` — optional hard deadline in QEC cycles Both fields use serde defaults for backward-compatible deserialization. Adapters, testkit, and all tests updated to supply the new fields.
Add `scheduling_weight: u32` to OpData, computed from IR priority during DAG construction (i16 biased to unsigned range). New `Dag::apply_critical_path_weights()` performs reverse-topological traversal to compute remaining-path-length per op, upgrading weights where the path length exceeds the priority-based value. Fixup nodes inherit their parent gate's weight to preserve scheduling urgency.
Wire SchedulingPolicyKind through EngineCore dispatch: policy-driven gate admission, stalled-gate selection, and factory restart decisions. Replace VecDeque with Vec for policy-indexed stalled gate access. Add DecisionRecorder threading for optional scheduling decision capture. Monomorphize run() over EventSink × DecisionSink cross-product.
Verify critical-path policy outperforms FIFO under contention, scheduling_weight propagation through DAG chains, fixup node weight inheritance, and decision recording stability. Includes multi-seed statistical validation with stochastic factories.
Replace concrete MagicStateBuffer with a BufferModel trait and CounterBuffer implementation behind BufferKind enum dispatch for zero-vtable-overhead extensibility.
Engine can now operate in advisory mode where factory events arrive from external sources (hardware telemetry, test harnesses) via observe()/step_to() instead of being generated internally. Shares 100% of the scheduling code path with simulation mode — only factory restart behavior differs. - ExternalEvent enum: FactoryProduced, FactoryFailed, InjectionOutcome, MeasurementResult, ProgramAbort - EngineSnapshot for serializable state inspection - EngineMode enum with mode-gated factory restart in dispatch loop - Advisory overrides for injection/measurement outcomes (zero-cost in simulation mode via Option<Box<AdvisoryOverrides>>) - PartialEq on SchedulingDecision/TimedDecision for equivalence testing - 12 integration tests including critical advisory/simulation equivalence
Add Serialize/Deserialize/FromStr/Copy to SchedulingPolicyKind so it can be embedded in MonteCarloConfig JSON and passed by value. Thread policy field into EngineConfig and MonteCarloConfig. Wire decision recording into all EngineResult variants with decisions() accessor. Update all existing call sites and tests to explicitly pass FIFO.
…tends Add --policy (fifo|critical-path) and --decisions flags to CLI profile and monte-carlo commands. Add profile_with_policy/monte_carlo_with_policy to WASM API. Add policy and decisions kwargs to Python profile/trace/ monte_carlo functions with PySchedulingDecision wrapper type.
…ffold Extend RoutingModel to accept cycle and active routes for stateful contention tracking. Add ActiveRoute type and route_completed callback. Introduce SlottedBuffer behind slotted-buffer feature gate.
Merging this PR will degrade performance by 13.85%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | sampled_4[100] |
173.2 µs | 211.8 µs | -18.24% |
| ❌ | run[1000gates_2000qubits] |
922.8 µs | 1,121.3 µs | -17.71% |
| ❌ | run[2000gates_4000qubits] |
1.8 ms | 2.2 ms | -17.03% |
| ❌ | run[500gates_1000qubits] |
477 µs | 562.2 µs | -15.16% |
| ❌ | run[100gates_200qubits] |
112.6 µs | 132.5 µs | -15.05% |
| ❌ | engine_run[500] |
961.6 µs | 1,114 µs | -13.68% |
| ❌ | engine_run[10] |
44.3 µs | 50.5 µs | -12.25% |
| ❌ | sampled_4[2000] |
3.4 ms | 3.8 ms | -12.21% |
| ❌ | streaming[100] |
363.1 µs | 410.6 µs | -11.56% |
| ❌ | engine_run[100] |
208.5 µs | 235 µs | -11.25% |
| ❌ | full[2000] |
3.8 ms | 4.3 ms | -11.08% |
| ❌ | full[500] |
991.4 µs | 1,108.2 µs | -10.54% |
| 🆕 | run[1000gates_2000qubits] |
N/A | 1.9 ms | N/A |
| 🆕 | run[500gates_1000qubits] |
N/A | 773.3 µs | N/A |
| 🆕 | critical_path |
N/A | 1.2 ms | N/A |
| 🆕 | lookahead |
N/A | 1.4 ms | N/A |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing feat/scheduler-core (c7485ee) with master (7bffc5c)
…ing scaffolding - Delete SlottedBuffer placeholder and its slotted-buffer feature gate - Remove ActiveRoute struct, route_completed trait method, and active_routes parameter from RoutingModel::latency - BufferKind now only has Counter variant (single-variant enum kept for future extensibility without API churn)
- Restructure stalled gates from flat Vec to Vec<Vec<StalledGate>> indexed by pool — eliminates O(n) cross-pool scanning in hot loop - Separate priority (i16) from scheduling_weight (u32) in OpData; critical-path weights now overwrite instead of max with priority, fixing CriticalPath policy being functionally inert - Narrow pool_idx from usize to u16 throughout scheduling types - Replace 6-parameter should_restart_factory with FactoryRestartContext - Delete trivial run_with_decisions! macro, inline direct calls - Convert observe() from assert-panic to Result<(), EngineError> with ObserveRequiresAdvisory variant
…g tests - Add 4 CriticalPath policy proptests: determinism, all-gates-complete, monotonic traces, dual-pool determinism (10k seeds each) - Update observe_panics_in_simulation_mode to assert Result::Err instead of #[should_panic] to match new observe() signature - Update advisory tests for Result-returning observe() - Update scheduling_policy tests for FactoryRestartContext and separated weight/priority semantics
priority: i16 was stored on OpData and preserved through DAG construction and fixup injection, but never read by any engine or scheduling code. Removes 2 bytes/node of dead weight from the hot-path struct. The IR Operation::priority field remains for future deadline-aware policies; it will be re-added to OpData and wired into StalledGate when a policy consumes it. - Remove priority from OpData, from_circuit, inject_fixup - Add clone justification comment on fixup qubit clone - Remove dag_build_preserves_priority test (tested dead field) - Clean stale priority assertions from weight tests
FIFO policy always removes index 0 (front). Vec::remove(0) is O(n) — shifts all elements. VecDeque gives O(1) front removal. CriticalPath arbitrary-index removal is O(min(idx, len-idx)), same or better than Vec. - Change stalled_per_pool from Vec<Vec<StalledGate>> to Vec<VecDeque<StalledGate>> - Use make_contiguous() before passing slice to policy - VecDeque::remove returns Option — graceful instead of panicking - push → push_back, first → front, Vec::is_empty → VecDeque::is_empty
Replace magic number 16 with state-proportional capacity hint for FullDecisionRecorder in advisory step_to. Uses stalled gate count + factory pool count (min 8) to size the allocation based on actual decisions the step can produce.
- Add EngineResult::into_profile_and_decisions() for zero-copy extraction of both profile and decisions in a single consume - Replace .to_vec() clone in Python binding with the new method - Pre-allocate AdvisoryOverrides HashMaps with capacity from circuit metadata - Remove stale superseded Trurlic decision
buffer.rs → buffer/{mod, counter, dispatch, model}.rs
routing.rs → routing/{mod, dispatch, manhattan, model, scalar}.rs
Separates concerns within each module: counter logic, dispatch helpers,
model types, and routing strategies each get their own file. Public API
unchanged.
Move DagDemandAnalysis, demand_analysis(), critical_path_t_density(), and static_t_ready_cycles() from dag/mod.rs into dag/demand.rs. Keeps the DAG struct focused on graph structure; demand analysis lives in its own file alongside the topological-sort implementation.
engine/mod.rs → engine/{core, build, run, sink}.rs
- core: EngineCore struct, EngineMode, DecoderState, FactoryPool
- build: construction helpers (hook table, decoder, factory pools)
- run: simulation loop (run_loop, step_inner, process_decoder)
- sink: EventSink enum + trace size estimation
scheduling/mod.rs → scheduling/{types, policy, dispatch}.rs
- types: StalledGate, AdmitDecision, FactoryRestartContext
- policy: SchedulingPolicy trait
- dispatch: SchedulingPolicyKind enum + serde impls
Breaks the two largest files into focused submodules. Public API unchanged.
Introduce SchedulingContext, PoolState, and GateCandidate — structured snapshots of engine state for policy decisions. Add peek_pool_production on EventQueue and peek_magic_demand on FifoReadyQueue to gather per-pool factory activity and ready-set demand without allocation.
Add prepare() phase to SchedulingPolicy, called once per cycle before dispatch decisions. Replace scattered per-call arguments (gate_id, weight, buffer_occupancy, cycle) with GateCandidate struct. Remove cycle parameter from select_stalled_gate — policies access it via prepare().
…licies SchedulingPolicyKind and MonteCarloConfig lose Copy to accommodate policies with cached state (e.g. SmallVec forecasts). All downstream move sites gain .clone() — these are construction-time, not hot-loop. - policy_reason() now takes &SchedulingPolicyKind (no implicit copy) - Add PolicyReason::Lookahead variant for upcoming policy - Lookahead arm added to SchedulingPolicyKind enum dispatch (empty) - Test files updated for non-Copy MonteCarloConfig
Production-forecast-aware admission control that extends critical-path stall selection with buffer conservation. Uses prepare() to cache per-pool forecasts (SmallVec<[PoolForecast; 2]>, zero heap alloc). Admission rules: - Buffer ≤1 state + higher-weight stalled gates → Stall - Production imminent (≤3 cycles) → Dispatch freely - Production distant + buffer low + low-weight gate → Stall Wired into SchedulingPolicyKind enum dispatch, CLI (--policy lookahead), and Python bindings (policy="lookahead").
12 new tests covering: - Stall selection (highest weight, FIFO tie-break) - Admission control: holds state for higher priority gates - Admission control: dispatches freely when production imminent - Admission control: conserves when production distant - Engine integration: completes chain circuits - Outperforms CriticalPath on contention (deterministic + multi-seed) - Enum dispatch, string roundtrip, JSON serialization - Identical to CriticalPath without contention
Compute per-gate ASAP/ALAP scheduling slack via topological traversal, deriving slack_ratio (fraction of T-gates with zero slack) as a measure of scheduling flexibility. Add delta_max: worst-case cumulative demand minus supply deficit across time buckets. Thread both metrics through EngineResult, ProfileAnalyzer, MonteCarloSummary, and sensitivity OutputMetric. Add unit and integration tests for slack computation, delta_max bounds, and cross-mode consistency.
Add `routing_cost_cycles: Option<u32>` to Operation in pirx-ir, allowing compilers to provide pre-computed routing costs that bypass model estimation. Propagate through validation with serde default/skip_serializing_if. Add `Congestion` variant to RoutingConfig in pirx-hw with `cycles_per_hop` and `congestion_factor` fields. Validate cycles_per_hop > 0 and congestion_factor >= 0 with is_finite() guard. Add InvalidCongestionFactor error variant. Include Congestion in routing_physical_qubits estimation.
…oute_completed lifecycle CongestionRouting applies latency = manhattan_distance × cycles_per_hop × (1 + α × active_routes) with statistical congestion tracking via an active_count counter incremented on latency() and decremented on route_completed(). Propagate routing_cost_cycles from IR Operation to OpData as routing_cost_override. When present, total_gate_cost() uses the override directly without calling RoutingModel::latency() and without tracking the gate for route_completed() callbacks. Wire route_completed() into the engine: track model-estimated routed gates in SmallVec<[OpKey; 4]> on EngineCore, call route_completed() on GateCompleted for model-estimated routes only. Add route_completed() default no-op to RoutingModel trait. Forward both latency() and route_completed() through RoutingKind enum dispatch. Update RoutingModelInfo, ProfileAnalyzer, and DAG construction to handle the new Congestion variant and routing_cost_override field.
…ture Add routing_cost_cycles: None to all Operation construction sites across adapters, testkit fixtures, and integration tests. Add congestion_hw() fixture to pirx-testkit for congestion routing tests with zero injection error probability and preloaded buffer.
Make DAG adjacency maps, OpData.active, and OpData.scheduling_weight pub(crate) instead of pub — these are internal engine state that should not leak through the public API. Add is_active() and scheduling_weight() accessors for external consumers. Restrict initial_ready_set() to cfg(test) since only unit tests use it; the engine uses the iterator variant.
…lysis Break large functions into focused helpers: check_injection_error, dispatch_magic_gate, serve_one_stalled in engine dispatch; propagate_demand in DAG demand analysis; compute_asap_schedule and compute_alap_schedule in slack analysis. Simplify serve_stalled_for_pool by removing redundant pool_idx parameter. Clean up iterator patterns and formatting throughout.
…ror, and let-chains Add Copy to PoolState, SchedulingDecision, and TimedDecision — these are small value types that benefit from implicit copies over clones. Replace unit error type in SchedulingPolicyKind::from_str with a proper ParsePolicyError. Switch deserialization to a visitor pattern for correct str handling. Use let-chains in LookaheadPolicy for cleaner control flow. Tune SmallVec capacities from 2 to 1 to match typical single-pool usage. Add reserved-field documentation for future policy extensibility.
… bindings Add --policy flag to the CLI compare subcommand, passing the user's scheduling policy choice through to run_comparison. Change run_comparison to take &MonteCarloConfig (avoids unnecessary clone). Add default_congestion_factor (0.1) for RoutingConfig::Congestion. Update wasm and python bindings for ParsePolicyError and Copy-derived types.
…uting Add proptest properties for the LookaheadPolicy (determinism, all gates complete, monotonic traces, dual-pool determinism) and CongestionRouting (determinism, monotonic traces). Add integration test verifying routing_cost_cycles override bypasses the routing model. Add unit tests for congestion RoutingModelInfo and routing_cost_cycles IR default.
…urations Guard routed_gates tracking behind route_tracking flag — only congestion routing is stateful; Manhattan/Scalar route_completed() are no-ops. Skip build_scheduling_context() for FIFO and CriticalPath policies whose prepare() is a no-op, avoiding O(heap_size) scan per cycle. Skip apply_critical_path_weights() for FIFO policy which ignores weights, saving O(V+E) at simulation start. Add FIFO fast-path in serve_stalled_for_pool using pop_front() instead of make_contiguous() + policy dispatch + remove(idx).
…ngestion routing All existing benchmarks used FIFO policy and scalar/Manhattan routing, leaving CriticalPath, Lookahead, and congestion routing invisible to CodSpeed. Add scheduling_policy group (CriticalPath + Lookahead at 500 gates) and congestion_routing group (500 + 1000 gates) to catch regressions in policy-specific and stateful-routing code paths.
…implementations Extract decoder backpressure modeling into a standalone module with a trait-based design matching factory/buffer/routing patterns. Two implementations: ConstantThroughputDecoder (deterministic, no RNG) and MM1Decoder (stochastic M/M/1 queue with exponential service times). Enum dispatch via DecoderKind avoids vtable overhead.
Remove the monolithic DecoderState struct from engine/core.rs and wire the engine to use DecoderModel trait via DecoderKind enum dispatch. Separate measurements_per_qubit (QEC encoding property) from the decoder model. SchedulingContext now exposes a single DecoderPressure snapshot instead of separate pending/stalled fields. Engine run loop delegates tick/stall logic to the trait, eliminating inline decoder arithmetic.
Add load(&self) method to RoutingModel trait returning a zero-alloc RoutingLoad struct (active_routes: u32, congestion_level: f64). Default implementation returns zero for stateless models (Scalar, Manhattan). CongestionRouting overrides with live active_count and congestion_factor × active_count. RoutingKind enum delegates via the established match-dispatch pattern.
Add routing: RoutingLoad field to SchedulingContext, populated from self.routing.load() in build_scheduling_context(). Re-export RoutingLoad from crate root. Update existing SchedulingContext literals in tests. Add six new tests covering default zero load, congestion load tracking, enum delegation, and context integration.
Allow scheduling policies to hold buffer states instead of always serving a stalled gate. None signals the engine to break out of the dispatch loop, preserving the state for a higher-priority gate.
…ady gates Track buffer_occupancy in PoolForecast and return None from select_stalled_gate when buffer is critically low and a higher-weight gate is about to arrive in Phase 5.
… hold-back Cover always-Some guarantee for FIFO and CriticalPath, None-returning test policies (AlwaysHold, HoldOnce), and LookaheadPolicy hold/serve decisions based on buffer occupancy and weight comparison.
Fuse demand_analysis + compute_slack into single static_analysis (2 O(V+E) passes instead of 3, sharing the forward ASAP pass). Add defers_admission/defers_restart marker methods on SchedulingPolicy so the engine skips GateCandidate/FactoryRestartContext construction and policy dispatch for FIFO. Defer StalledGate construction to stall path only, guard trace_id lookup behind advisory_overrides check, and add FIFO pop_front fast-path in serve_stalled_for_pool.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Pluggable scheduling policy framework for the DES engine — replaces the hardcoded FIFO dispatch with a trait-based system, adds critical-path scheduling, decision recording, and an advisory mode for external event-driven simulation.
Why
The engine previously dispatched magic states to stalled gates in insertion order, which is fine for small circuits but leaves performance on the table for deep circuits with uneven critical paths. Profiling scheduling decisions was also impossible - the engine applied them silently with no observability.
Advisory mode is needed to drive the engine from external hardware telemetry or test harnesses without reimplementing the scheduling logic.
How
SchedulingPolicytrait (select_stalled_gate,admit_gate,should_restart_factory) - the engine calls the policy at each decision point. Two implementations:FifoPolicy(preserves existing behavior) andCriticalPathPolicy(prioritizes gates with the longest remaining DAG path).SchedulingPolicyKind- zero vtable overhead, inlineable,Sendfor parallel sweeps. Same pattern asRoutingKindandBufferKind.BufferModeltrait extracted from the concrete counter buffer, with enum dispatch (BufferKind) for future slotted/priority buffer variants.DecisionRecordertrait withNoOpDecisionRecorder(compiles to nothing via monomorphization) andFullDecisionRecorder. The engine is generic overD: DecisionRecorder, so recording is zero-cost when disabled.scheduling_weight: backward BFS from sinks computes longest-path weight per node. Stored inSecondaryMap,computed once after DAG construction.
Engine::observe/Engine::step_to/Engine::snapshot): external events (factory produced/failed, injection outcomes, measurement results) injected at specific cycles. The engine returnsTimedDecisionrecords instead of driving its own factory schedule.RoutingModeltrait extended withActiveRouteandroute_completedfor contention-aware routing (scaffold - no contention model yet).priorityanddeadline_cyclesfields onOperationfor future deadline-aware policies.EngineConfig,SimulationConfig, CLI (--policy), WASM, and Python bindings.Testing
make cipasses locally (fmt + clippy + test + audit)Checklist
unwrap()/expect()in production code