Add blog post on AKS Configurable Scheduler Profiles#5505
Add blog post on AKS Configurable Scheduler Profiles#5505colinmixonn wants to merge 123 commits intomasterfrom
Conversation
This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU utilization, pod distribution across topology domains, and memory-optimized scheduling.
Added a new tag for Scheduler with relevant details.
Updated blog post on AKS Configurable Scheduler Profiles to improve clarity and correctness, including sections on GPU utilization, pod distribution, and memory-optimized scheduling.
Corrected typos and improved clarity in the blog post about AKS Configurable Scheduler Profiles.
Updated the blog to clarify the objectives of configuring AKS Configurable Scheduler Profiles, improved section titles, and ensured consistency in terminology.
Clarified the objectives and improved the wording in the blog post about AKS Configurable Scheduler Profiles.
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This pull request adds a new blog post announcing the preview of AKS Configurable Scheduler Profiles, a feature that enables fine-grained control over pod scheduling strategies to optimize resource utilization and improve workload performance.
Key Changes
- Introduces a new "scheduler" tag to categorize blog posts related to pod placement and scheduling optimization
- Adds comprehensive blog post covering three main scheduling use cases: GPU bin-packing for AI workloads, pod distribution across topology domains for resilience, and memory-optimized scheduling with PVC-aware placement
- Provides YAML configuration examples and best practices for implementing custom scheduler profiles
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.
| File | Description |
|---|---|
| website/blog/tags.yml | Adds new "scheduler" tag for categorizing posts about pod placement and scheduling techniques |
| website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md | New blog post introducing AKS Configurable Scheduler Profiles with configuration examples for GPU utilization, topology distribution, and memory-optimized scheduling |
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
…index.md Co-authored-by: Diego Casati <diego.casati@gmail.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updated FAQ section to clarify interactions between Configurable Scheduler Profiles and autoscalers, including Node Auto Provisioning, Cluster Autoscaler, and Vertical Pod Autoscaler. Enhanced explanations for resource omission in scoring strategy.
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updated the explanation of the Kubernetes scheduler's filtering and scoring phases, added a diagram, and clarified the benefits of the RequestedToCapacityRatio scoring strategy for CPU utilization.
Revised text for clarity and consistency regarding Kubernetes cluster CPU utilization and Configurable Scheduler Profiles. Updated YAML examples and added notes for resource adjustments.
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
Updated the blog post to clarify the benefits of Configurable Scheduler Profiles on AKS, including improvements in CPU and GPU utilization. Enhanced explanations of scheduling strategies and their impact on resource efficiency and cost optimization.
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
Clarify the explanation of the 'RequestedToCapacityRatio' and 'MostAllocated' scheduling strategies for AKS, emphasizing their impact on CPU and GPU utilization.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Ahmed Sabbour <103856+sabbour@users.noreply.github.com>
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
kaarthis
left a comment
There was a problem hiding this comment.
Good write up but opportunities identified to make it better .
| tags: [ai, performance, scheduler, best-practices, cost] | ||
| --- | ||
|
|
||
| As reported by CastAI, on average Kubernetes clusters only reach [10% CPU utilization][cast-ai-k8s-cost-report] and Datadog finds most Kubernetes containers use less than [25% of their requested CPU][datadog-state-of-containers]. This data signals that underutilized resources are materially contributing to increased infrastructure cost. While there are many factors that impact node utilization, as a core component of the Kubernetes control plane, the kube-scheduler has a big influence on node utilization. |
There was a problem hiding this comment.
What year are these CastAI and Datadog reports from? Cite the year. Stale stats kill credibility with senior platform engineers. Also — the blog headlines GPU utilization but the hook is CPU-only. Add one GPU underutilization stat (NVIDIA publishes these, or use internal AKS telemetry) so the AI/ML reader is hooked from line 1.
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
|
|
||
| The Kubernetes scheduler operates in two cycles: a synchronous scheduling cycle and an asynchronous binding cycle. The scheduling cycle has two sub-phases, filtering and scoring, and only manages one pod at a time. | ||
|
|
||
| 1. **Filtering** phase removes unsuitable nodes based on hard and soft constraints. |
There was a problem hiding this comment.
Good foundational content. But — does the external customer reading a blog about configurable profiles need 20 lines on how the default scheduler works? Consider trimming to 5-6 lines + diagram + a link to upstream docs for "deep dive on scheduling cycles." Get to the value prop faster.
There was a problem hiding this comment.
This section is a part of the value prop because the reader first needs to understand how the default scheduler works to understand the config scheduler.
The value prop is raised in the intro and with the TOC links if it's already known the reader can skip.
| multiPoint: | ||
| enabled: | ||
| - name: NodeResourcesFit | ||
| disabled: |
There was a problem hiding this comment.
The YAML disables PodTopologySpread:
Why is this not explained? This is a significant decision. A customer who applies this profile loses zone-aware spreading. If they have topology spread constraints in their pod spec, those are silently ignored by this scheduler profile. This needs a prominent warning:
"We disable PodTopologySpread in this profile because bin-packing and zone-spreading are opposing goals. If you need both high utilization AND zone resilience, use separate named profiles for different workload classes (e.g., cpu-binpacking-scheduler for batch, default-scheduler for HA services)." This directly connects to Wilson's NAP blog which covers topology spread in depth. Cross-reference opportunity.
| - name: NodeResourcesFit | ||
| - name: NodeResourcesBalancedAllocation | ||
| disabled: | ||
| - name: PodTopologySpread |
There was a problem hiding this comment.
same concern as before why disabled and explain . Even more critical for GPU workloads — if a customer has multi-zone GPU nodes and disables topology spread, all inference pods could land in one zone. One zone outage = total inference downtime. This needs a warning.
|
|
||
| 1. Which Bin packing strategy does AKS recommend to increase node utilization? AKS recommends using the scoring strategy `RequestedToCapacityRatio` because it provides a more granular scoring approach allowing users to define an ideal utilization curve for their respective nodes. For example, this bin packing strategy allows users to configure a target utilization of 85%. | ||
| 2. How does Configurable Scheduler Profiles interact with autoscalers such as Node Auto Provisioning (NAP), Cluster Autoscaler (CAS), and Vertical Pod Autoscaler (VPA)? These components are complementary to each other. Configurable Scheduler Profiles influence how pods are placed on nodes, while autoscalers make scaling decisions based on resource utilization and pending pods. | ||
| - **Node Auto Provisioning (NAP)** is triggered when pods are unschedulable. If a suitable node already exists, that pod will be scheduled with the defined Configurable Scheduler Profile. If no suitable node exists, NAP provisions new capacity, after which the pod is scheduled according to the selected profile. |
There was a problem hiding this comment.
FAQ #2 repeats Wilson's content less thoroughly. The customer who reads both blogs will find redundancy.
Is it a blog consolidation opportunity - pls check with wilson and Ahmed.
There was a problem hiding this comment.
Wilsons blog will only Focus on NAP. This section is to provide some answers to the immediate question of using any autoscaler with config scheduler. Wilson and i are in conversation about cross-referencing
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
|
Missing Section: Upstream Kubernetes Work & AKS Roadmap - you can point to KEPs and work being done and planned too. |
Revise blog content for clarity and conciseness regarding Configurable Scheduler Profiles on AKS, emphasizing benefits and operational details.
Corrected formatting and improved clarity of scheduler profile monitoring metrics.
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
| This blog explains how the default Kubernetes scheduler places pods, where the defaults fall short, and how to increase node utilization using Configurable Scheduler Profiles on AKS. | ||
|
|
||
| 1. [How does kube-scheduler work?](#how-does-the-default-kubernetes-scheduler-place-pods) | ||
| 2. [Use Configurable Scheduler Profiles to ncrease node utilization and operator control](#configurable-scheduler-profiles-on-aks) |
There was a problem hiding this comment.
Typo in the table of contents link text: "ncrease" should be "increase".
| 2. [Use Configurable Scheduler Profiles to ncrease node utilization and operator control](#configurable-scheduler-profiles-on-aks) | |
| 2. [Use Configurable Scheduler Profiles to increase node utilization and operator control](#configurable-scheduler-profiles-on-aks) |
|
|
||
| `MostAllocated` scores nodes based on their current resource utilization, favoring nodes that are already more heavily utilized. Unlike `RequestedToCapacityRatio`, it does not consider node capacity in node scoring, making it more suitable for an aggressive cost-optimization scheduling strategy. When paired with MostAllocated, `NodeResourcesBalancedAllocation` complements the behavior because it encourages pod placement on nodes with user-defined proportional utilization, helping reduce bottlenecks caused by asymmetric resource pressure. | ||
|
|
||
| When combined, these plugins favor GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This results in more efficient GPU placement and fewer partially utilized nodes.[Configure node bin-packing][configure-most-allocated] using the MostAllocated strategy to improve utilization and reduce infrastructure costs. |
There was a problem hiding this comment.
Missing space after the period before the "Configure node bin-packing" link, which will read as a single word/sentence fragment when rendered.
| When combined, these plugins favor GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This results in more efficient GPU placement and fewer partially utilized nodes.[Configure node bin-packing][configure-most-allocated] using the MostAllocated strategy to improve utilization and reduce infrastructure costs. | |
| When combined, these plugins favor GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This results in more efficient GPU placement and fewer partially utilized nodes. [Configure node bin-packing][configure-most-allocated] using the MostAllocated strategy to improve utilization and reduce infrastructure costs. |
This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU and CPU utilization.