Skip to content

docs(scheduler): document the priority scheduler#1583

Merged
slin1237 merged 2 commits into
mainfrom
docs/priority-scheduler
Jun 1, 2026
Merged

docs(scheduler): document the priority scheduler#1583
slin1237 merged 2 commits into
mainfrom
docs/priority-scheduler

Conversation

@slin1237
Copy link
Copy Markdown
Collaborator

@slin1237 slin1237 commented Jun 1, 2026

Description

Problem

The priority scheduler (M1–M6) shipped with no user-facing docs — no reference for the x-smg-priority header, response codes, or config knobs.

Solution

Add a concept page (how it works) and a reference page (exact contract), wired into the mkdocs nav.

Changes

  • docs/concepts/reliability/priority-scheduling.md (new) — classes, slots/reservations, per-class queues, preemption + TTFT protection, starvation promotion, fail-safe fallback.
  • docs/reference/priority-scheduler.md (new) — x-smg-priority + tenant clamp, response codes (503/429/408/499 + headers), CLI flags, YAML schema, built-in defaults, metrics.
  • Nav: both pages registered under Reliability / Reference.

Test Plan

  • Docs-only. Values cross-checked against config.rs / error.rs / metrics.rs. mkdocs build --strict runs in CI.
Checklist
  • cargo +nightly fmt passes (no code changes)
  • cargo clippy --all-targets --all-features -- -D warnings passes (no code changes)
  • Documentation updated

Summary by CodeRabbit

  • Documentation
    • Added Priority Scheduling concept page explaining an opt-in, priority-aware admission layer with multiple classes, slot-based capacity, per-class queues, preemption behavior, starvation prevention, and fallback to legacy admission.
    • Added Priority Scheduler reference detailing request outcomes, configuration options, startup/fail-safe behavior, tenant limits, and exposed metrics; navigation updated accordingly.

Concept page (how it works) + reference page (header, response codes, CLI/YAML config, defaults, metrics) for the priority-admission scheduler, wired into the mkdocs nav.

Signed-off-by: Simo Lin <25425177+slin1237@users.noreply.github.com>
@slin1237 slin1237 requested a review from CatherineSue as a code owner June 1, 2026 16:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 82c497aa-6585-4c5e-88e2-20d984b29e7f

📥 Commits

Reviewing files that changed from the base of the PR and between 70ddb40 and 689411f.

📒 Files selected for processing (2)
  • docs/concepts/reliability/priority-scheduling.md
  • docs/reference/priority-scheduler.md

📝 Walkthrough

Walkthrough

Adds complete documentation for an opt-in priority-aware admission scheduler: a concept page describing classes, slot-based capacity, queuing, preemption with TTFT safety, starvation prevention, and fail-safe fallback; a technical reference with header/HTTP/CLI/YAML/metrics contracts; and site navigation updates.

Changes

Priority Scheduling Feature Documentation

Layer / File(s) Summary
Priority Scheduling concept & design overview
docs/concepts/reliability/priority-scheduling.md
Explains the opt-in priority scheduler as a replacement for legacy FIFO admission, defining four priority classes (system, interactive, default, bulk) with tenant-based clamping, slot-based capacity modeling with reserved slots, per-class FIFO queues with depth/timeout limits, preemption of pre-first-byte requests with TTFT safety boundaries, starvation prevention via threshold-based promotion, and fail-safe fallback when disabled or misconfigured. Includes a "busy restaurant" analogy and observability via Prometheus metrics.
Priority Scheduler technical reference
docs/reference/priority-scheduler.md
Documents the complete scheduler contract: x-smg-priority header parsing and validation rules, HTTP response codes for preemption/queue-full/queue-timeout/client-cancel, CLI flags and YAML configuration schema (class reservations, queue depths, preemption/starvation knobs, per-tenant max_class overrides), startup validation and fail-safe fallback rules, tenant policy resolution, and full Prometheus metrics catalog.
Documentation navigation integration
docs/concepts/reliability/index.md, mkdocs.yml
Adds "Priority Scheduling" link to the Reliability concepts section index and registers new concept and reference pages in site navigation under Reliability and Reference sections respectively.

Sequence Diagram(s)

sequenceDiagram
  participant ConceptPage
  participant ReferencePage
  participant MkDocsNav
  ConceptPage->>MkDocsNav: add concept entry (concepts/reliability/priority-scheduling.md)
  ReferencePage->>MkDocsNav: add reference entry (reference/priority-scheduler.md)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • lightseekorg/smg#1581: Adds docs specifying the priority-scheduler contract, header handling, and legacy fallback behavior.
  • lightseekorg/smg#1572: Implements priority scheduler preemption path and TTFT/preempt coordination in SchedulerGuardBody.
  • lightseekorg/smg#1579: Adds scheduler metrics and tracing instrumentation used by the priority admission middleware/engine.

Suggested reviewers

  • CatherineSue
  • claude

Suggested labels

documentation

"I hop through docs with a careful cart,
Four queues, gentle preemptions, a schematic heart,
Slots held like seats at a cozy cafe light,
Starvation gets rescued before it eats the night,
A rabbit cheers docs tidy and bright!" 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs(scheduler): document the priority scheduler' directly and clearly describes the main change: adding documentation for the priority scheduler across concept and reference pages.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/priority-scheduler

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds conceptual and reference documentation for the new Priority Scheduling feature, updating the navigation and index files. The review feedback identifies two documentation issues: a discrepancy regarding queue draining behavior, which actually operates in strict priority order rather than round-robin, and a typo in the preemption warning note.

- If the request waits longer than the class's timeout, it is rejected (**408**).
- If the client disconnects while waiting, the wait is abandoned (**499**).

Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in priority order — `system` → `interactive` → `default` → `bulk` — admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation states that the dispatcher drains queues by 'admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another.' However, looking at the implementation of wake_next_waiter in model_gateway/src/middleware/scheduler/engine.rs, the loop always starts scanning from system to bulk on every single invocation. Since the dispatcher runs while scheduler.wake_next_waiter() {}, it will completely drain higher-priority queues before admitting any lower-priority waiters. Therefore, a continuous flood of higher-priority requests will fully starve lower-priority queues (until they exceed their starvation thresholds). The documentation should be updated to accurately reflect this strict priority behavior.

Suggested change
Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in priority order — `system``interactive``default``bulk`admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another.
Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in strict priority order — system -> interactive -> default -> bulk — meaning a continuous flood of higher-priority requests can fully starve lower-priority queues until they exceed their starvation thresholds.
References
  1. In documentation for technical values like metric labels, use the literal values (slugs) instead of human-readable prose names to ensure accuracy and make them copy-paste friendly for users.

A preempted request receives **503** with `Retry-After: 1` and an `X-SMG-Preempted: true` header, so clients and proxies can tell a preemption apart from an ordinary overload and retry promptly. If the victim's slot does not free within a short budget, the preemptor simply falls through to the queue — the cancel has already fired, so the slot frees shortly regardless.

!!! warning "Preemption requires cancel-aware handlers"
A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, `can_preempt` is expected to stay off in practice even though the machinery is in place — a marked-but-unwound victim has its first data frame truncated rather than its upstream work stopped.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo: 'marked-but-unwound' should be 'marked-but-not-unwound'. Since 'unwound' refers to the state where the request handler has successfully unwound and stopped upstream work, a victim that fails to unwind is 'not unwound'.

Suggested change
A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, `can_preempt` is expected to stay off in practice even though the machinery is in place — a marked-but-unwound victim has its first data frame truncated rather than its upstream work stopped.
A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, can_preempt is expected to stay off in practice even though the machinery is in place — a marked-but-not-unwound victim has its first data frame truncated rather than its upstream work stopped.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70ddb405da

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


- If the queue is already at its configured depth, the request is rejected immediately (**429**).
- If the request waits longer than the class's timeout, it is rejected (**408**).
- If the client disconnects while waiting, the wait is abandoned (**499**).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not promise queued disconnects are abandoned

In the current middleware, queued-client disconnect detection is explicitly not wired: priority_admission_middleware passes a fresh CancellationToken into admit, so a client that disconnects while waiting is not abandoned until the queue timeout or later admission path runs. This line tells operators that long queue timeouts are safe because disconnected waiters are cleaned up and reported as 499, but in this implementation they can continue occupying queue capacity until timeout.

Useful? React with 👍 / 👎.

Comment thread docs/reference/priority-scheduler.md Outdated
| `--priority-scheduler-enabled` | `false` | Master switch. When unset, the legacy concurrency-limit middleware stays wired and no scheduler is constructed. |
| `--priority-scheduler-default-max-class` | `default` | Maximum class for tenants not listed in the YAML (`system` \| `interactive` \| `default` \| `bulk`). Parsed with the same rules as the header — an unknown value falls back to `default`. |
| `--priority-scheduler-config` | unset | Path to the optional priority-scheduler YAML (per-class overrides + per-tenant policy). Absent → built-in defaults and an empty tenant policy map. |
| `--priority-scheduler-tenant-metric-top-n` | `32` | Cap on per-tenant scheduler metric label cardinality. Tenants past the top N are bucketed under `tenant="other"`. |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not document an unenforced tenant metric cap

I checked the scheduler metrics implementation and priority_scheduler_tenant_metric_top_n is only stored in SchedulerSettings; no scheduler metric path applies a top-N selection or emits tenant="other" (the tenant-labelled clamp/unknown counters intern the raw tenant directly). Documenting this flag as a working cardinality cap can lead operators to rely on it for Prometheus safety when it currently has no effect.

Useful? React with 👍 / 👎.

Address review (gemini/codex): queued-client disconnect isn't detected (499 not emitted, waiter held until timeout); dispatcher drains strict priority not round-robin (starvation guard is the anti-starvation mechanism); tenant-metric-top-n is stored but not yet enforced; fix marked-but-not-unwound typo.

Signed-off-by: Simo Lin <25425177+slin1237@users.noreply.github.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 689411ff81

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


| Status | Condition | `X-SMG-Error-Code` | Extra headers |
|--------|-----------|--------------------|---------------|
| **503** Service Unavailable | **Preempted** — admitted, then cancelled before its first byte to make room for a higher-priority request | `scheduler_preempted` | `X-SMG-Preempted: true`, `Retry-After: 1` |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not promise every preemption returns 503

This response contract only holds when a handler observes the scheduler cancel token before it has produced a response. I checked priority_admission_middleware, and it explicitly notes that if preemption fires after the handler has produced headers but before the first body byte, SchedulerGuardBody truncates the body and the request is recorded as admitted because the 503 marker/header cannot be added. For handlers that do not use PreemptionGuard or that flush headers early, clients will not receive 503, Retry-After, or X-SMG-Preempted, so documenting this as the exact response code/header contract will lead retry logic to miss real preemptions.

Useful? React with 👍 / 👎.

@slin1237
Copy link
Copy Markdown
Collaborator Author

slin1237 commented Jun 1, 2026

Addressed review feedback (gemini / codex):

  • Queued-client disconnect: corrected — it isn't detected today, so 499 isn't emitted and the waiter is held until the timeout (the cancel signal isn't wired to disconnect at this stage).
  • Drain order: corrected to strict priority (higher tiers fully drain first); the starvation guard is what prevents indefinite starvation — removed the inaccurate "one waiter per class per pass" claim.
  • tenant-metric-top-n: marked not yet enforced (stored but no top-N / tenant="other" bucketing is applied today).
  • Fixed the marked-but-not-unwound typo.

@slin1237 slin1237 merged commit 0d2f51e into main Jun 1, 2026
10 checks passed
@slin1237 slin1237 deleted the docs/priority-scheduler branch June 1, 2026 21:04
CatherineSue added a commit that referenced this pull request Jun 1, 2026
The CI pre-commit runs codespell repo-wide and flagged a pre-existing typo
in the priority-scheduling concept page (from #1583); fixing it here so

Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
#1588's pre-commit passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant