docs(scheduler): document the priority scheduler#1583
Conversation
Concept page (how it works) + reference page (header, response codes, CLI/YAML config, defaults, metrics) for the priority-admission scheduler, wired into the mkdocs nav. Signed-off-by: Simo Lin <25425177+slin1237@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds complete documentation for an opt-in priority-aware admission scheduler: a concept page describing classes, slot-based capacity, queuing, preemption with TTFT safety, starvation prevention, and fail-safe fallback; a technical reference with header/HTTP/CLI/YAML/metrics contracts; and site navigation updates. ChangesPriority Scheduling Feature Documentation
Sequence Diagram(s)sequenceDiagram
participant ConceptPage
participant ReferencePage
participant MkDocsNav
ConceptPage->>MkDocsNav: add concept entry (concepts/reliability/priority-scheduling.md)
ReferencePage->>MkDocsNav: add reference entry (reference/priority-scheduler.md)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request adds conceptual and reference documentation for the new Priority Scheduling feature, updating the navigation and index files. The review feedback identifies two documentation issues: a discrepancy regarding queue draining behavior, which actually operates in strict priority order rather than round-robin, and a typo in the preemption warning note.
| - If the request waits longer than the class's timeout, it is rejected (**408**). | ||
| - If the client disconnects while waiting, the wait is abandoned (**499**). | ||
|
|
||
| Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in priority order — `system` → `interactive` → `default` → `bulk` — admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another. |
There was a problem hiding this comment.
The documentation states that the dispatcher drains queues by 'admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another.' However, looking at the implementation of wake_next_waiter in model_gateway/src/middleware/scheduler/engine.rs, the loop always starts scanning from system to bulk on every single invocation. Since the dispatcher runs while scheduler.wake_next_waiter() {}, it will completely drain higher-priority queues before admitting any lower-priority waiters. Therefore, a continuous flood of higher-priority requests will fully starve lower-priority queues (until they exceed their starvation thresholds). The documentation should be updated to accurately reflect this strict priority behavior.
| Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in priority order — `system` → `interactive` → `default` → `bulk` — admitting one waiter per class per pass so a flood in one class cannot fully starve the drain of another. | |
| Higher classes have shorter queues and shorter timeouts (fail fast, the latency matters); lower classes have deeper queues and longer timeouts (wait patiently, throughput matters). The dispatcher drains queues in strict priority order — system -> interactive -> default -> bulk — meaning a continuous flood of higher-priority requests can fully starve lower-priority queues until they exceed their starvation thresholds. |
References
- In documentation for technical values like metric labels, use the literal values (slugs) instead of human-readable prose names to ensure accuracy and make them copy-paste friendly for users.
| A preempted request receives **503** with `Retry-After: 1` and an `X-SMG-Preempted: true` header, so clients and proxies can tell a preemption apart from an ordinary overload and retry promptly. If the victim's slot does not free within a short budget, the preemptor simply falls through to the queue — the cancel has already fired, so the slot frees shortly regardless. | ||
|
|
||
| !!! warning "Preemption requires cancel-aware handlers" | ||
| A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, `can_preempt` is expected to stay off in practice even though the machinery is in place — a marked-but-unwound victim has its first data frame truncated rather than its upstream work stopped. |
There was a problem hiding this comment.
Typo: 'marked-but-unwound' should be 'marked-but-not-unwound'. Since 'unwound' refers to the state where the request handler has successfully unwound and stopped upstream work, a victim that fails to unwind is 'not unwound'.
| A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, `can_preempt` is expected to stay off in practice even though the machinery is in place — a marked-but-unwound victim has its first data frame truncated rather than its upstream work stopped. | |
| A victim only actually unwinds if its request handler honors the scheduler's cancel signal. Until every long-running handler is wired to that signal, can_preempt is expected to stay off in practice even though the machinery is in place — a marked-but-not-unwound victim has its first data frame truncated rather than its upstream work stopped. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70ddb405da
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| - If the queue is already at its configured depth, the request is rejected immediately (**429**). | ||
| - If the request waits longer than the class's timeout, it is rejected (**408**). | ||
| - If the client disconnects while waiting, the wait is abandoned (**499**). |
There was a problem hiding this comment.
Do not promise queued disconnects are abandoned
In the current middleware, queued-client disconnect detection is explicitly not wired: priority_admission_middleware passes a fresh CancellationToken into admit, so a client that disconnects while waiting is not abandoned until the queue timeout or later admission path runs. This line tells operators that long queue timeouts are safe because disconnected waiters are cleaned up and reported as 499, but in this implementation they can continue occupying queue capacity until timeout.
Useful? React with 👍 / 👎.
| | `--priority-scheduler-enabled` | `false` | Master switch. When unset, the legacy concurrency-limit middleware stays wired and no scheduler is constructed. | | ||
| | `--priority-scheduler-default-max-class` | `default` | Maximum class for tenants not listed in the YAML (`system` \| `interactive` \| `default` \| `bulk`). Parsed with the same rules as the header — an unknown value falls back to `default`. | | ||
| | `--priority-scheduler-config` | unset | Path to the optional priority-scheduler YAML (per-class overrides + per-tenant policy). Absent → built-in defaults and an empty tenant policy map. | | ||
| | `--priority-scheduler-tenant-metric-top-n` | `32` | Cap on per-tenant scheduler metric label cardinality. Tenants past the top N are bucketed under `tenant="other"`. | |
There was a problem hiding this comment.
Do not document an unenforced tenant metric cap
I checked the scheduler metrics implementation and priority_scheduler_tenant_metric_top_n is only stored in SchedulerSettings; no scheduler metric path applies a top-N selection or emits tenant="other" (the tenant-labelled clamp/unknown counters intern the raw tenant directly). Documenting this flag as a working cardinality cap can lead operators to rely on it for Prometheus safety when it currently has no effect.
Useful? React with 👍 / 👎.
Address review (gemini/codex): queued-client disconnect isn't detected (499 not emitted, waiter held until timeout); dispatcher drains strict priority not round-robin (starvation guard is the anti-starvation mechanism); tenant-metric-top-n is stored but not yet enforced; fix marked-but-not-unwound typo. Signed-off-by: Simo Lin <25425177+slin1237@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 689411ff81
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| | Status | Condition | `X-SMG-Error-Code` | Extra headers | | ||
| |--------|-----------|--------------------|---------------| | ||
| | **503** Service Unavailable | **Preempted** — admitted, then cancelled before its first byte to make room for a higher-priority request | `scheduler_preempted` | `X-SMG-Preempted: true`, `Retry-After: 1` | |
There was a problem hiding this comment.
Do not promise every preemption returns 503
This response contract only holds when a handler observes the scheduler cancel token before it has produced a response. I checked priority_admission_middleware, and it explicitly notes that if preemption fires after the handler has produced headers but before the first body byte, SchedulerGuardBody truncates the body and the request is recorded as admitted because the 503 marker/header cannot be added. For handlers that do not use PreemptionGuard or that flush headers early, clients will not receive 503, Retry-After, or X-SMG-Preempted, so documenting this as the exact response code/header contract will lead retry logic to miss real preemptions.
Useful? React with 👍 / 👎.
|
Addressed review feedback (gemini / codex):
|
Description
Problem
The priority scheduler (M1–M6) shipped with no user-facing docs — no reference for the
x-smg-priorityheader, response codes, or config knobs.Solution
Add a concept page (how it works) and a reference page (exact contract), wired into the mkdocs nav.
Changes
docs/concepts/reliability/priority-scheduling.md(new) — classes, slots/reservations, per-class queues, preemption + TTFT protection, starvation promotion, fail-safe fallback.docs/reference/priority-scheduler.md(new) —x-smg-priority+ tenant clamp, response codes (503/429/408/499 + headers), CLI flags, YAML schema, built-in defaults, metrics.Test Plan
config.rs/error.rs/metrics.rs.mkdocs build --strictruns in CI.Checklist
cargo +nightly fmtpasses (no code changes)cargo clippy --all-targets --all-features -- -D warningspasses (no code changes)Summary by CodeRabbit