docs(scheduler): document the priority scheduler by slin1237 · Pull Request #1583 · lightseekorg/smg

slin1237 · 2026-06-01T16:01:43Z

Description

Problem

The priority scheduler (M1–M6) shipped with no user-facing docs — no reference for the x-smg-priority header, response codes, or config knobs.

Solution

Add a concept page (how it works) and a reference page (exact contract), wired into the mkdocs nav.

Changes

docs/concepts/reliability/priority-scheduling.md (new) — classes, slots/reservations, per-class queues, preemption + TTFT protection, starvation promotion, fail-safe fallback.
docs/reference/priority-scheduler.md (new) — x-smg-priority + tenant clamp, response codes (503/429/408/499 + headers), CLI flags, YAML schema, built-in defaults, metrics.
Nav: both pages registered under Reliability / Reference.

Test Plan

Docs-only. Values cross-checked against config.rs / error.rs / metrics.rs. mkdocs build --strict runs in CI.

Checklist

cargo +nightly fmt passes (no code changes)
cargo clippy --all-targets --all-features -- -D warnings passes (no code changes)
Documentation updated

Summary by CodeRabbit

Documentation
- Added Priority Scheduling concept page explaining an opt-in, priority-aware admission layer with multiple classes, slot-based capacity, per-class queues, preemption behavior, starvation prevention, and fallback to legacy admission.
- Added Priority Scheduler reference detailing request outcomes, configuration options, startup/fail-safe behavior, tenant limits, and exposed metrics; navigation updated accordingly.

Concept page (how it works) + reference page (header, response codes, CLI/YAML config, defaults, metrics) for the priority-admission scheduler, wired into the mkdocs nav. Signed-off-by: Simo Lin <25425177+slin1237@users.noreply.github.com>

coderabbitai · 2026-06-01T16:01:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 82c497aa-6585-4c5e-88e2-20d984b29e7f

📥 Commits

Reviewing files that changed from the base of the PR and between 70ddb40 and 689411f.

📒 Files selected for processing (2)

docs/concepts/reliability/priority-scheduling.md
docs/reference/priority-scheduler.md

📝 Walkthrough

Walkthrough

Adds complete documentation for an opt-in priority-aware admission scheduler: a concept page describing classes, slot-based capacity, queuing, preemption with TTFT safety, starvation prevention, and fail-safe fallback; a technical reference with header/HTTP/CLI/YAML/metrics contracts; and site navigation updates.

Changes

Priority Scheduling Feature Documentation

Layer / File(s)	Summary
Priority Scheduling concept & design overview `docs/concepts/reliability/priority-scheduling.md`	Explains the opt-in priority scheduler as a replacement for legacy FIFO admission, defining four priority classes (`system`, `interactive`, `default`, `bulk`) with tenant-based clamping, slot-based capacity modeling with reserved slots, per-class FIFO queues with depth/timeout limits, preemption of pre-first-byte requests with TTFT safety boundaries, starvation prevention via threshold-based promotion, and fail-safe fallback when disabled or misconfigured. Includes a "busy restaurant" analogy and observability via Prometheus metrics.
Priority Scheduler technical reference `docs/reference/priority-scheduler.md`	Documents the complete scheduler contract: `x-smg-priority` header parsing and validation rules, HTTP response codes for preemption/queue-full/queue-timeout/client-cancel, CLI flags and YAML configuration schema (class reservations, queue depths, preemption/starvation knobs, per-tenant `max_class` overrides), startup validation and fail-safe fallback rules, tenant policy resolution, and full Prometheus metrics catalog.
Documentation navigation integration `docs/concepts/reliability/index.md`, `mkdocs.yml`	Adds "Priority Scheduling" link to the Reliability concepts section index and registers new concept and reference pages in site navigation under Reliability and Reference sections respectively.

Sequence Diagram(s)

sequenceDiagram
  participant ConceptPage
  participant ReferencePage
  participant MkDocsNav
  ConceptPage->>MkDocsNav: add concept entry (concepts/reliability/priority-scheduling.md)
  ReferencePage->>MkDocsNav: add reference entry (reference/priority-scheduler.md)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

lightseekorg/smg#1581: Adds docs specifying the priority-scheduler contract, header handling, and legacy fallback behavior.
lightseekorg/smg#1572: Implements priority scheduler preemption path and TTFT/preempt coordination in SchedulerGuardBody.
lightseekorg/smg#1579: Adds scheduler metrics and tracing instrumentation used by the priority admission middleware/engine.

Suggested reviewers

CatherineSue
claude

Suggested labels

documentation

"I hop through docs with a careful cart,
Four queues, gentle preemptions, a schematic heart,
Slots held like seats at a cozy cafe light,
Starvation gets rescued before it eats the night,
A rabbit cheers docs tidy and bright!" 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'docs(scheduler): document the priority scheduler' directly and clearly describes the main change: adding documentation for the priority scheduler across concept and reference pages.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/priority-scheduler

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds conceptual and reference documentation for the new Priority Scheduling feature, updating the navigation and index files. The review feedback identifies two documentation issues: a discrepancy regarding queue draining behavior, which actually operates in strict priority order rather than round-robin, and a typo in the preemption warning note.

gemini-code-assist · 2026-06-01T16:03:22Z

+- If the request waits longer than the class's timeout, it is rejected (**408**).
+- If the client disconnects while waiting, the wait is abandoned (**499**).
+
+Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in priority order — `system` → `interactive` → `default` → `bulk` — admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another.


The documentation states that the dispatcher drains queues by 'admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another.' However, looking at the implementation of wake_next_waiter in model_gateway/src/middleware/scheduler/engine.rs, the loop always starts scanning from system to bulk on every single invocation. Since the dispatcher runs while scheduler.wake_next_waiter() {}, it will completely drain higher-priority queues before admitting any lower-priority waiters. Therefore, a continuous flood of higher-priority requests will fully starve lower-priority queues (until they exceed their starvation thresholds). The documentation should be updated to accurately reflect this strict priority behavior.

Suggested change

Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in priority order — `system` → `interactive` → `default` → `bulk` — admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another.

Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in strict priority order — system -> interactive -> default -> bulk — meaning a continuous flood of higher-priority requests can fully starve lower-priority queues until they exceed their starvation thresholds.

References

In documentation for technical values like metric labels, use the literal values (slugs) instead of human-readable prose names to ensure accuracy and make them copy-paste friendly for users.

gemini-code-assist · 2026-06-01T16:03:22Z

+A preempted request receives **503** with `Retry-After: 1` and an `X-SMG-Preempted: true` header, so clients and proxies can tell a preemption apart from an ordinary overload and retry promptly. If the victim's slot does not free within a short budget, the preemptor simply falls through to the queue — the cancel has already fired, so the slot frees shortly regardless.
+
+!!! warning "Preemption requires cancel-aware handlers"
+    A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, `can_preempt` is expected to stay off in practice even though the machinery is in place — a marked-but-unwound victim has its first data frame truncated rather than its upstream work stopped.


Typo: 'marked-but-unwound' should be 'marked-but-not-unwound'. Since 'unwound' refers to the state where the request handler has successfully unwound and stopped upstream work, a victim that fails to unwind is 'not unwound'.

Suggested change

A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, `can_preempt` is expected to stay off in practice even though the machinery is in place — a marked-but-unwound victim has its first data frame truncated rather than its upstream work stopped.

A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, can_preempt is expected to stay off in practice even though the machinery is in place — a marked-but-not-unwound victim has its first data frame truncated rather than its upstream work stopped.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70ddb405da

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T16:03:51Z

+
+- If the queue is already at its configured depth, the request is rejected immediately (**429**).
+- If the request waits longer than the class's timeout, it is rejected (**408**).
+- If the client disconnects while waiting, the wait is abandoned (**499**).


Do not promise queued disconnects are abandoned

In the current middleware, queued-client disconnect detection is explicitly not wired: priority_admission_middleware passes a fresh CancellationToken into admit, so a client that disconnects while waiting is not abandoned until the queue timeout or later admission path runs. This line tells operators that long queue timeouts are safe because disconnected waiters are cleaned up and reported as 499, but in this implementation they can continue occupying queue capacity until timeout.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T16:03:51Z

+| `--priority-scheduler-enabled` | `false` | Master switch. When unset, the legacy concurrency-limit middleware stays wired and no scheduler is constructed. |
+| `--priority-scheduler-default-max-class` | `default` | Maximum class for tenants not listed in the YAML (`system` \| `interactive` \| `default` \| `bulk`). Parsed with the same rules as the header — an unknown value falls back to `default`. |
+| `--priority-scheduler-config` | unset | Path to the optional priority-scheduler YAML (per-class overrides + per-tenant policy). Absent → built-in defaults and an empty tenant policy map. |
+| `--priority-scheduler-tenant-metric-top-n` | `32` | Cap on per-tenant scheduler metric label cardinality. Tenants past the top N are bucketed under `tenant="other"`. |


Do not document an unenforced tenant metric cap

I checked the scheduler metrics implementation and priority_scheduler_tenant_metric_top_n is only stored in SchedulerSettings; no scheduler metric path applies a top-N selection or emits tenant="other" (the tenant-labelled clamp/unknown counters intern the raw tenant directly). Documenting this flag as a working cardinality cap can lead operators to rely on it for Prometheus safety when it currently has no effect.

Useful? React with 👍 / 👎.

Address review (gemini/codex): queued-client disconnect isn't detected (499 not emitted, waiter held until timeout); dispatcher drains strict priority not round-robin (starvation guard is the anti-starvation mechanism); tenant-metric-top-n is stored but not yet enforced; fix marked-but-not-unwound typo. Signed-off-by: Simo Lin <25425177+slin1237@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 689411ff81

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T20:10:10Z

+
+| Status | Condition | `X-SMG-Error-Code` | Extra headers |
+|--------|-----------|--------------------|---------------|
+| **503** Service Unavailable | **Preempted** — admitted, then cancelled before its first byte to make room for a higher-priority request | `scheduler_preempted` | `X-SMG-Preempted: true`, `Retry-After: 1` |


Do not promise every preemption returns 503

This response contract only holds when a handler observes the scheduler cancel token before it has produced a response. I checked priority_admission_middleware, and it explicitly notes that if preemption fires after the handler has produced headers but before the first body byte, SchedulerGuardBody truncates the body and the request is recorded as admitted because the 503 marker/header cannot be added. For handlers that do not use PreemptionGuard or that flush headers early, clients will not receive 503, Retry-After, or X-SMG-Preempted, so documenting this as the exact response code/header contract will lead retry logic to miss real preemptions.

Useful? React with 👍 / 👎.

slin1237 · 2026-06-01T20:30:38Z

Addressed review feedback (gemini / codex):

Queued-client disconnect: corrected — it isn't detected today, so 499 isn't emitted and the waiter is held until the timeout (the cancel signal isn't wired to disconnect at this stage).
Drain order: corrected to strict priority (higher tiers fully drain first); the starvation guard is what prevents indefinite starvation — removed the inaccurate "one waiter per class per pass" claim.
tenant-metric-top-n: marked not yet enforced (stored but no top-N / tenant="other" bucketing is applied today).
Fixed the marked-but-not-unwound typo.

The CI pre-commit runs codespell repo-wide and flagged a pre-existing typo in the priority-scheduling concept page (from #1583); fixing it here so Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com> #1588's pre-commit passes.

slin1237 requested a review from CatherineSue as a code owner June 1, 2026 16:01

github-actions Bot added the documentation Improvements or additions to documentation label Jun 1, 2026

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

slin1237 merged commit 0d2f51e into main Jun 1, 2026
10 checks passed

slin1237 deleted the docs/priority-scheduler branch June 1, 2026 21:04

	A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, `can_preempt` is expected to stay off in practice even though the machinery is in place — a marked-but-unwound victim has its first data frame truncated rather than its upstream work stopped.
	A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, can_preempt is expected to stay off in practice even though the machinery is in place — a marked-but-not-unwound victim has its first data frame truncated rather than its upstream work stopped.

Conversation

slin1237 commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Suggested labels

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

slin1237 commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

slin1237 commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading