Skip to content

feat(scheduler): autoscaling gauges via metrics sampler (PR 2 M5b)#1580

Merged
slin1237 merged 1 commit into
mainfrom
feat/scheduler-autoscaling-metrics
Jun 1, 2026
Merged

feat(scheduler): autoscaling gauges via metrics sampler (PR 2 M5b)#1580
slin1237 merged 1 commit into
mainfrom
feat/scheduler-autoscaling-metrics

Conversation

@CatherineSue
Copy link
Copy Markdown
Member

@CatherineSue CatherineSue commented Jun 1, 2026

Description

Problem

Autoscalers need point-in-time signals (utilization, per-class pressure) to drive scale up/down decisions. M5a added the operational counters; this adds the capacity/autoscaling gauges.

Solution

A sampler task (5s interval, holds only a Weak<Self>) refreshes the gauges off the admission path, so the hot path stays counter-only. Stacked on #1579.

Changes

  • spawn_sampler + sample_metrics + class_pressure on PriorityScheduler; sampler spawned from AdmissionMode::from_config alongside the dispatcher.
  • Gauges (design §9 capacity/autoscaling table):
    • smg_scheduler_inflight{class}, smg_scheduler_queue_depth{class}
    • smg_scheduler_utilization (Σ inflight / capacity)
    • smg_scheduler_queue_size_limit{class}
    • smg_scheduler_class_capacity_pressure{class}max(depth/limit, max(0, inflight−reserved)/max(1, capacity−Σ_higher_reserved)), clamped 0.0–1.0
  • ClassQueue::capacity() accessor for the queue-size-limit gauge.
  • Unit test pinning the class_pressure formula (queue-dominates, slot-dominates, clamp, div-by-zero guard).

Deferred (documented in the deviations doc)

  • Per-tenant gauges (scheduler_tenant_inflight/queued): require threading a tenant field through every InflightHandle + Waiter + the admit/register_inflight signatures (~20 test call-site edits) for a secondary "noisy-tenant" metric. Deferred pending sign-off on the invasiveness.
  • scheduler_queue_wait_p95_seconds: derivable from the smg_scheduler_queue_wait_seconds histogram via histogram_quantile in PromQL; the standalone sliding-window gauge isn't worth a bespoke estimator.

Test Plan

  • cargo test -p smg --lib scheduler:: — 108 unit tests green (incl. the new class_pressure test).
  • Gauge scrape assertions land with the M6 integration tests (the sampler needs a running runtime).
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes

Summary by CodeRabbit

  • Chores
    • Added periodic background metrics sampling to the scheduler, collecting capacity, utilization, queue depth, inflight counts, and per-class pressure calculations for improved observability.

@CatherineSue CatherineSue requested a review from slin1237 as a code owner June 1, 2026 04:11
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 89688844-ee3f-4415-86d3-4008fbff6570

📥 Commits

Reviewing files that changed from the base of the PR and between 4823d74 and 993a7a8.

📒 Files selected for processing (4)
  • model_gateway/src/middleware/scheduler/engine.rs
  • model_gateway/src/middleware/scheduler/metrics.rs
  • model_gateway/src/middleware/scheduler/queue.rs
  • model_gateway/src/middleware/scheduler/state.rs

📝 Walkthrough

Walkthrough

This PR introduces a periodic metrics sampler to the PriorityScheduler that collects and reports capacity, inflight counts, queue depth, and normalized per-class pressure values to Prometheus gauges. The sampler runs as a background tokio::spawn task at 5-second intervals, holds only a weak reference to the scheduler, and exits automatically when the scheduler is dropped.

Changes

Scheduler Metrics Sampling

Layer / File(s) Summary
Metrics gauge definitions and setters
model_gateway/src/middleware/scheduler/metrics.rs
Prometheus gauge constants and public setter functions added for inflight count, queue depth, utilization, per-class queue size limit, and per-class capacity pressure. Module documentation updated to distinguish operational counters from capacity/autoscaling gauges, and describe() extends gauge registration.
Queue capacity API exposure
model_gateway/src/middleware/scheduler/queue.rs
ClassQueue trait and FifoClassQueue implementation extended with capacity() method to expose configured maximum queue depth for metrics sampling.
Sampler background loop and metric computation
model_gateway/src/middleware/scheduler/engine.rs
spawn_sampler() launches periodic background task via tokio::spawn with weak self-reference. sample_metrics() reads slot pool capacity, per-class inflight/queue/limit state and updates all gauges. class_pressure() computes normalized 0–1 per-class pressure from queue pressure and slot overflow pressure with saturating arithmetic, division-by-zero safeguards, and 1.0 clamping. Unit test validates pressure dominance, oversubscription clamping, and zero-limit edge cases.
Sampler initialization wiring
model_gateway/src/middleware/scheduler/state.rs
Imports Duration, defines SAMPLER_INTERVAL constant (5 seconds), and wires scheduler.spawn_sampler(SAMPLER_INTERVAL) during priority scheduler initialization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • lightseekorg/smg#1546: Directly overlaps on ClassQueue::capacity() and FifoClassQueue::capacity() trait definition and implementation.
  • lightseekorg/smg#1559: Modifies per-class queue limit semantics that affect the capacity values sampled and reported by this PR's new metrics.
  • lightseekorg/smg#1577: Concurrent modification of priority scheduler initialization wiring in state.rs for the same scheduler startup flow.

Suggested reviewers

  • slin1237
  • claude

Poem

🐰 A sampler hops through the scheduler night,
Gathering metrics with periodic might,
Queue depth and pressure, each class in view,
Weak references vanish when work is through!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding autoscaling gauges via a metrics sampler to the scheduler. It directly matches the primary objective of introducing capacity/autoscaling metrics collection.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/scheduler-autoscaling-metrics

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the model-gateway Model gateway crate changes label Jun 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a metrics sampler task to the priority scheduler that periodically refreshes point-in-time capacity and autoscaling gauges (such as in-flight requests, queue depth, utilization, queue size limits, and class capacity pressure) off the hot admission path. This ensures that the hot path only performs cheap counter increments. Additionally, corresponding Prometheus metrics, trait methods, and unit tests have been added. There are no review comments, so I have no feedback to provide.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation. The pressure formula math checks out (edge cases properly guarded), the Weak-based sampler lifecycle is correct, and the gauge plumbing is straightforward. No issues found.

@CatherineSue CatherineSue force-pushed the feat/scheduler-autoscaling-metrics branch from a82a6db to 8ff44b9 Compare June 1, 2026 04:46
@slin1237 slin1237 force-pushed the feat/scheduler-autoscaling-metrics branch from 8ff44b9 to 7c3e479 Compare June 1, 2026 06:19
@slin1237 slin1237 force-pushed the feat/scheduler-metrics branch from 22b1a2e to 7ae2c28 Compare June 1, 2026 06:19
Comment on lines 1463 to 1482
}

#[test]
fn test_class_pressure_takes_worse_of_queue_and_slot() {
// preempt_settings reserves 0 for every class, so higher_reserved=0
// and slot pressure is inflight/capacity.
let sched = PriorityScheduler::new(&preempt_settings(), 100).unwrap();
// Queue pressure (8/10) dominates slot pressure (10/100).
assert!((sched.class_pressure(Class::Default, 10, 8, 10, 100) - 0.8).abs() < 1e-9);
// Slot pressure (50/100) dominates queue pressure (1/10).
assert!((sched.class_pressure(Class::Default, 50, 1, 10, 100) - 0.5).abs() < 1e-9);
// Clamped to 1.0 when oversubscribed.
assert!((sched.class_pressure(Class::Default, 200, 100, 10, 100) - 1.0).abs() < 1e-9);
// No queue limit and no inflight → zero, no div-by-zero.
assert_eq!(sched.class_pressure(Class::Bulk, 0, 0, 0, 100), 0.0);
}

#[tokio::test(start_paused = true)]
async fn test_preempt_declined_when_only_victim_past_ttft() {
// The single lower-class inflight has already emitted its first
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: All four assertions use preempt_settings() which reserves 0 for every class, so higher_reserved is always 0 and the headroom calculation simplifies to capacity.max(1). A test case using settings with non-zero reservations (e.g. Interactive reserved=20, System reserved=10) would exercise the higher_reserved subtraction path in class_pressure — that's the most nuanced part of the formula.

Base automatically changed from feat/scheduler-metrics to main June 1, 2026 14:09
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 1, 2026

Hi @CatherineSue, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts that need to be resolved label Jun 1, 2026
Per design §9 capacity/autoscaling table. A sampler task (spawn_sampler,
5s interval, holds only a Weak<Self>) refreshes the point-in-time gauges
off the admission path so the hot path stays counter-only:

- smg_scheduler_inflight{class}
- smg_scheduler_queue_depth{class}
- smg_scheduler_utilization (Σ inflight / capacity)
- smg_scheduler_queue_size_limit{class}
- smg_scheduler_class_capacity_pressure{class} — max of queue pressure
  (depth/limit) and slot pressure (overflow past reservation over the
  headroom not reserved by higher classes), clamped to 0.0-1.0

Adds ClassQueue::capacity() for the queue-size-limit gauge and a unit
test pinning the class_pressure formula.

Deferred (documented in the deviations doc): per-tenant gauges
(scheduler_tenant_inflight/queued) need tenant threaded onto every
inflight handle + waiter — disproportionate for a secondary metric — and
scheduler_queue_wait_p95_seconds is derivable from the queue-wait
histogram via histogram_quantile in PromQL.

Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
@slin1237 slin1237 force-pushed the feat/scheduler-autoscaling-metrics branch from 7c3e479 to 993a7a8 Compare June 1, 2026 14:32
@mergify mergify Bot removed the needs-rebase PR has merge conflicts that need to be resolved label Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 993a7a85bc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +601 to +603
let overflow =
u32::from(inflight).saturating_sub(u32::from(self.slot_pool.reserved(class)));
let slot_pressure = f64::from(overflow) / f64::from(headroom);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Account for other classes when computing slot pressure

This slot-pressure calculation only uses this class's own in-flight count, so it can report near-zero pressure while the class has no admissible slots because sibling classes are consuming the shared pool. For example, with capacity 100, System reserved 20, and Default using the other 80 slots, slots_available_to(..., Class::Bulk) is 0, but Bulk gets overflow = 0 here and pressure is driven only by depth / limit; with a large queue limit the autoscaling gauge can stay very low even though Bulk is completely capacity-blocked. The sampler should mirror the admission headroom calculation (total used plus unused higher reservations) rather than considering only same-class inflight.

Useful? React with 👍 / 👎.

@slin1237 slin1237 merged commit 27021ab into main Jun 1, 2026
18 checks passed
@slin1237 slin1237 deleted the feat/scheduler-autoscaling-metrics branch June 1, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-gateway Model gateway crate changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants