feat(scheduler): autoscaling gauges via metrics sampler (PR 2 M5b) by CatherineSue · Pull Request #1580 · lightseekorg/smg

CatherineSue · 2026-06-01T04:11:30Z

Description

Problem

Autoscalers need point-in-time signals (utilization, per-class pressure) to drive scale up/down decisions. M5a added the operational counters; this adds the capacity/autoscaling gauges.

Solution

A sampler task (5s interval, holds only a Weak<Self>) refreshes the gauges off the admission path, so the hot path stays counter-only. Stacked on #1579.

Changes

spawn_sampler + sample_metrics + class_pressure on PriorityScheduler; sampler spawned from AdmissionMode::from_config alongside the dispatcher.
Gauges (design §9 capacity/autoscaling table):
- smg_scheduler_inflight{class}, smg_scheduler_queue_depth{class}
- smg_scheduler_utilization (Σ inflight / capacity)
- smg_scheduler_queue_size_limit{class}
- smg_scheduler_class_capacity_pressure{class} — max(depth/limit, max(0, inflight−reserved)/max(1, capacity−Σ_higher_reserved)), clamped 0.0–1.0
ClassQueue::capacity() accessor for the queue-size-limit gauge.
Unit test pinning the class_pressure formula (queue-dominates, slot-dominates, clamp, div-by-zero guard).

Deferred (documented in the deviations doc)

Per-tenant gauges (scheduler_tenant_inflight/queued): require threading a tenant field through every InflightHandle + Waiter + the admit/register_inflight signatures (~20 test call-site edits) for a secondary "noisy-tenant" metric. Deferred pending sign-off on the invasiveness.
scheduler_queue_wait_p95_seconds: derivable from the smg_scheduler_queue_wait_seconds histogram via histogram_quantile in PromQL; the standalone sliding-window gauge isn't worth a bespoke estimator.

Test Plan

cargo test -p smg --lib scheduler:: — 108 unit tests green (incl. the new class_pressure test).
Gauge scrape assertions land with the M6 integration tests (the sampler needs a running runtime).

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes

Summary by CodeRabbit

Chores
- Added periodic background metrics sampling to the scheduler, collecting capacity, utilization, queue depth, inflight counts, and per-class pressure calculations for improved observability.

coderabbitai · 2026-06-01T04:11:37Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 89688844-ee3f-4415-86d3-4008fbff6570

📥 Commits

Reviewing files that changed from the base of the PR and between 4823d74 and 993a7a8.

📒 Files selected for processing (4)

model_gateway/src/middleware/scheduler/engine.rs
model_gateway/src/middleware/scheduler/metrics.rs
model_gateway/src/middleware/scheduler/queue.rs
model_gateway/src/middleware/scheduler/state.rs

📝 Walkthrough

Walkthrough

This PR introduces a periodic metrics sampler to the PriorityScheduler that collects and reports capacity, inflight counts, queue depth, and normalized per-class pressure values to Prometheus gauges. The sampler runs as a background tokio::spawn task at 5-second intervals, holds only a weak reference to the scheduler, and exits automatically when the scheduler is dropped.

Changes

Scheduler Metrics Sampling

Layer / File(s)	Summary
Metrics gauge definitions and setters `model_gateway/src/middleware/scheduler/metrics.rs`	Prometheus gauge constants and public setter functions added for inflight count, queue depth, utilization, per-class queue size limit, and per-class capacity pressure. Module documentation updated to distinguish operational counters from capacity/autoscaling gauges, and `describe()` extends gauge registration.
Queue capacity API exposure `model_gateway/src/middleware/scheduler/queue.rs`	`ClassQueue` trait and `FifoClassQueue` implementation extended with `capacity()` method to expose configured maximum queue depth for metrics sampling.
Sampler background loop and metric computation `model_gateway/src/middleware/scheduler/engine.rs`	`spawn_sampler()` launches periodic background task via `tokio::spawn` with weak self-reference. `sample_metrics()` reads slot pool capacity, per-class inflight/queue/limit state and updates all gauges. `class_pressure()` computes normalized 0–1 per-class pressure from queue pressure and slot overflow pressure with saturating arithmetic, division-by-zero safeguards, and 1.0 clamping. Unit test validates pressure dominance, oversubscription clamping, and zero-limit edge cases.
Sampler initialization wiring `model_gateway/src/middleware/scheduler/state.rs`	Imports `Duration`, defines `SAMPLER_INTERVAL` constant (5 seconds), and wires `scheduler.spawn_sampler(SAMPLER_INTERVAL)` during priority scheduler initialization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

lightseekorg/smg#1546: Directly overlaps on ClassQueue::capacity() and FifoClassQueue::capacity() trait definition and implementation.
lightseekorg/smg#1559: Modifies per-class queue limit semantics that affect the capacity values sampled and reported by this PR's new metrics.
lightseekorg/smg#1577: Concurrent modification of priority scheduler initialization wiring in state.rs for the same scheduler startup flow.

Suggested reviewers

slin1237
claude

Poem

🐰 A sampler hops through the scheduler night,
Gathering metrics with periodic might,
Queue depth and pressure, each class in view,
Weak references vanish when work is through!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: adding autoscaling gauges via a metrics sampler to the scheduler. It directly matches the primary objective of introducing capacity/autoscaling metrics collection.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/scheduler-autoscaling-metrics

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a metrics sampler task to the priority scheduler that periodically refreshes point-in-time capacity and autoscaling gauges (such as in-flight requests, queue depth, utilization, queue size limits, and class capacity pressure) off the hot admission path. This ensures that the hot path only performs cheap counter increments. Additionally, corresponding Prometheus metrics, trait methods, and unit tests have been added. There are no review comments, so I have no feedback to provide.

claude

Clean implementation. The pressure formula math checks out (edge cases properly guarded), the Weak-based sampler lifecycle is correct, and the gauge plumbing is straightforward. No issues found.

claude · 2026-06-01T06:22:53Z

    }

+    #[test]
+    fn test_class_pressure_takes_worse_of_queue_and_slot() {
+        // preempt_settings reserves 0 for every class, so higher_reserved=0
+        // and slot pressure is inflight/capacity.
+        let sched = PriorityScheduler::new(&preempt_settings(), 100).unwrap();
+        // Queue pressure (8/10) dominates slot pressure (10/100).
+        assert!((sched.class_pressure(Class::Default, 10, 8, 10, 100) - 0.8).abs() < 1e-9);
+        // Slot pressure (50/100) dominates queue pressure (1/10).
+        assert!((sched.class_pressure(Class::Default, 50, 1, 10, 100) - 0.5).abs() < 1e-9);
+        // Clamped to 1.0 when oversubscribed.
+        assert!((sched.class_pressure(Class::Default, 200, 100, 10, 100) - 1.0).abs() < 1e-9);
+        // No queue limit and no inflight → zero, no div-by-zero.
+        assert_eq!(sched.class_pressure(Class::Bulk, 0, 0, 0, 100), 0.0);
+    }
+
    #[tokio::test(start_paused = true)]
    async fn test_preempt_declined_when_only_victim_past_ttft() {
        // The single lower-class inflight has already emitted its first


🟡 Nit: All four assertions use preempt_settings() which reserves 0 for every class, so higher_reserved is always 0 and the headroom calculation simplifies to capacity.max(1). A test case using settings with non-zero reservations (e.g. Interactive reserved=20, System reserved=10) would exercise the higher_reserved subtraction path in class_pressure — that's the most nuanced part of the formula.

mergify · 2026-06-01T14:11:19Z

Hi @CatherineSue, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

Per design §9 capacity/autoscaling table. A sampler task (spawn_sampler, 5s interval, holds only a Weak<Self>) refreshes the point-in-time gauges off the admission path so the hot path stays counter-only: - smg_scheduler_inflight{class} - smg_scheduler_queue_depth{class} - smg_scheduler_utilization (Σ inflight / capacity) - smg_scheduler_queue_size_limit{class} - smg_scheduler_class_capacity_pressure{class} — max of queue pressure (depth/limit) and slot pressure (overflow past reservation over the headroom not reserved by higher classes), clamped to 0.0-1.0 Adds ClassQueue::capacity() for the queue-size-limit gauge and a unit test pinning the class_pressure formula. Deferred (documented in the deviations doc): per-tenant gauges (scheduler_tenant_inflight/queued) need tenant threaded onto every inflight handle + waiter — disproportionate for a secondary metric — and scheduler_queue_wait_p95_seconds is derivable from the queue-wait histogram via histogram_quantile in PromQL. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 993a7a85bc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T14:34:53Z

+        let overflow =
+            u32::from(inflight).saturating_sub(u32::from(self.slot_pool.reserved(class)));
+        let slot_pressure = f64::from(overflow) / f64::from(headroom);


Account for other classes when computing slot pressure

This slot-pressure calculation only uses this class's own in-flight count, so it can report near-zero pressure while the class has no admissible slots because sibling classes are consuming the shared pool. For example, with capacity 100, System reserved 20, and Default using the other 80 slots, slots_available_to(..., Class::Bulk) is 0, but Bulk gets overflow = 0 here and pressure is driven only by depth / limit; with a large queue limit the autoscaling gauge can stay very low even though Bulk is completely capacity-blocked. The sampler should mirror the admission headroom calculation (total used plus unused higher reservations) rather than considering only same-class inflight.

Useful? React with 👍 / 👎.

CatherineSue requested a review from slin1237 as a code owner June 1, 2026 04:11

github-actions Bot added the model-gateway Model gateway crate changes label Jun 1, 2026

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

claude Bot approved these changes Jun 1, 2026

View reviewed changes

CatherineSue mentioned this pull request Jun 1, 2026

test(scheduler): integration wiring + fallback guards (PR 2 M6) #1581

Merged

2 tasks

CatherineSue force-pushed the feat/scheduler-autoscaling-metrics branch from a82a6db to 8ff44b9 Compare June 1, 2026 04:46

slin1237 force-pushed the feat/scheduler-autoscaling-metrics branch from 8ff44b9 to 7c3e479 Compare June 1, 2026 06:19

slin1237 force-pushed the feat/scheduler-metrics branch from 22b1a2e to 7ae2c28 Compare June 1, 2026 06:19

claude Bot reviewed Jun 1, 2026

View reviewed changes

Base automatically changed from feat/scheduler-metrics to main June 1, 2026 14:09

mergify Bot added the needs-rebase PR has merge conflicts that need to be resolved label Jun 1, 2026

slin1237 force-pushed the feat/scheduler-autoscaling-metrics branch from 7c3e479 to 993a7a8 Compare June 1, 2026 14:32

mergify Bot removed the needs-rebase PR has merge conflicts that need to be resolved label Jun 1, 2026

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 1, 2026

View reviewed changes

slin1237 merged commit 27021ab into main Jun 1, 2026
18 checks passed

slin1237 deleted the feat/scheduler-autoscaling-metrics branch June 1, 2026 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scheduler): autoscaling gauges via metrics sampler (PR 2 M5b)#1580

feat(scheduler): autoscaling gauges via metrics sampler (PR 2 M5b)#1580
slin1237 merged 1 commit into
mainfrom
feat/scheduler-autoscaling-metrics

CatherineSue commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Jun 1, 2026

Uh oh!

mergify Bot commented Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CatherineSue commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Deferred (documented in the deviations doc)

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CatherineSue commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading