Add blog post on AKS Configurable Scheduler Profiles by colinmixonn · Pull Request #5505 · Azure/AKS

colinmixonn · 2025-12-10T23:33:55Z

This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU and CPU utilization.

This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU utilization, pod distribution across topology domains, and memory-optimized scheduling.

Added a new tag for Scheduler with relevant details.

Updated blog post on AKS Configurable Scheduler Profiles to improve clarity and correctness, including sections on GPU utilization, pod distribution, and memory-optimized scheduling.

Corrected typos and improved clarity in the blog post about AKS Configurable Scheduler Profiles.

Updated the blog to clarify the objectives of configuring AKS Configurable Scheduler Profiles, improved section titles, and ensured consistency in terminology.

Clarified the objectives and improved the wording in the blog post about AKS Configurable Scheduler Profiles.

website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md

Copilot

Pull request overview

This pull request adds a new blog post announcing the preview of AKS Configurable Scheduler Profiles, a feature that enables fine-grained control over pod scheduling strategies to optimize resource utilization and improve workload performance.

Key Changes

Introduces a new "scheduler" tag to categorize blog posts related to pod placement and scheduling optimization
Adds comprehensive blog post covering three main scheduling use cases: GPU bin-packing for AI workloads, pod distribution across topology domains for resilience, and memory-optimized scheduling with PVC-aware placement
Provides YAML configuration examples and best practices for implementing custom scheduler profiles

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.

File	Description
website/blog/tags.yml	Adds new "scheduler" tag for categorizing posts about pod placement and scheduling techniques
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md	New blog post introducing AKS Configurable Scheduler Profiles with configuration examples for GPU utilization, topology distribution, and memory-optimized scheduling

website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md

website/blog/tags.yml

website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md

…index.md Co-authored-by: Diego Casati <diego.casati@gmail.com>

…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Updated FAQ section to clarify interactions between Configurable Scheduler Profiles and autoscalers, including Node Auto Provisioning, Cluster Autoscaler, and Vertical Pod Autoscaler. Enhanced explanations for resource omission in scoring strategy.

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 9 comments.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 4 comments.

Updated the explanation of the Kubernetes scheduler's filtering and scoring phases, added a diagram, and clarified the benefits of the RequestedToCapacityRatio scoring strategy for CPU utilization.

Revised text for clarity and consistency regarding Kubernetes cluster CPU utilization and Configurable Scheduler Profiles. Updated YAML examples and added notes for resource adjustments.

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 5 comments.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 2 comments.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

Updated the blog post to clarify the benefits of Configurable Scheduler Profiles on AKS, including improvements in CPU and GPU utilization. Enhanced explanations of scheduling strategies and their impact on resource efficiency and cost optimization.

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 7 comments.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

Clarify the explanation of the 'RequestedToCapacityRatio' and 'MostAllocated' scheduling strategies for AKS, emphasizing their impact on CPU and GPU utilization.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 5 comments.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated no new comments.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

Co-authored-by: Ahmed Sabbour <103856+sabbour@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 1 comment.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

kaarthis

Good write up but opportunities identified to make it better .

kaarthis · 2026-03-26T23:49:18Z

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

+tags: [ai, performance, scheduler, best-practices, cost]
+---
+
+As reported by CastAI, on average Kubernetes clusters only reach [10% CPU utilization][cast-ai-k8s-cost-report] and Datadog finds most Kubernetes containers use less than [25% of their requested CPU][datadog-state-of-containers]. This data signals that underutilized resources are materially contributing to increased infrastructure cost. While there are many factors that impact node utilization, as a core component of the Kubernetes control plane, the kube-scheduler has a big influence on node utilization.


What year are these CastAI and Datadog reports from? Cite the year. Stale stats kill credibility with senior platform engineers. Also — the blog headlines GPU utilization but the hook is CPU-only. Add one GPU underutilization stat (NVIDIA publishes these, or use internal AKS telemetry) so the AI/ML reader is hooked from line 1.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

kaarthis · 2026-03-27T00:11:38Z

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

+
+The Kubernetes scheduler operates in two cycles: a synchronous scheduling cycle and an asynchronous binding cycle. The scheduling cycle has two sub-phases, filtering and scoring, and only manages one pod at a time.
+
+1. **Filtering** phase removes unsuitable nodes based on hard and soft constraints.


Good foundational content. But — does the external customer reading a blog about configurable profiles need 20 lines on how the default scheduler works? Consider trimming to 5-6 lines + diagram + a link to upstream docs for "deep dive on scheduling cycles." Get to the value prop faster.

This section is a part of the value prop because the reader first needs to understand how the default scheduler works to understand the config scheduler.

The value prop is raised in the intro and with the TOC links if it's already known the reader can skip.

kaarthis · 2026-03-27T17:49:04Z

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

+          multiPoint:
+            enabled:
+              - name: NodeResourcesFit
+            disabled:


The YAML disables PodTopologySpread:
Why is this not explained? This is a significant decision. A customer who applies this profile loses zone-aware spreading. If they have topology spread constraints in their pod spec, those are silently ignored by this scheduler profile. This needs a prominent warning:

"We disable PodTopologySpread in this profile because bin-packing and zone-spreading are opposing goals. If you need both high utilization AND zone resilience, use separate named profiles for different workload classes (e.g., cpu-binpacking-scheduler for batch, default-scheduler for HA services)." This directly connects to Wilson's NAP blog which covers topology spread in depth. Cross-reference opportunity.

kaarthis · 2026-03-27T17:50:28Z

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

+              - name: NodeResourcesFit
+              - name: NodeResourcesBalancedAllocation
+            disabled:
+              - name: PodTopologySpread


same concern as before why disabled and explain . Even more critical for GPU workloads — if a customer has multi-zone GPU nodes and disables topology spread, all inference pods could land in one zone. One zone outage = total inference downtime. This needs a warning.

kaarthis · 2026-03-27T17:51:09Z

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

+
+1. Which Bin packing strategy does AKS recommend to increase node utilization? AKS recommends using the scoring strategy `RequestedToCapacityRatio` because it provides a more granular scoring approach allowing users to define an ideal utilization curve for their respective nodes. For example, this bin packing strategy allows users to configure a target utilization of 85%.
+2. How does Configurable Scheduler Profiles interact with autoscalers such as Node Auto Provisioning (NAP), Cluster Autoscaler (CAS), and Vertical Pod Autoscaler (VPA)? These components are complementary to each other. Configurable Scheduler Profiles influence how pods are placed on nodes, while autoscalers make scaling decisions based on resource utilization and pending pods.
+    - **Node Auto Provisioning (NAP)** is triggered when pods are unschedulable. If a suitable node already exists, that pod will be scheduled with the defined Configurable Scheduler Profile. If no suitable node exists, NAP provisions new capacity, after which the pod is scheduled according to the selected profile.


FAQ #2 repeats Wilson's content less thoroughly. The customer who reads both blogs will find redundancy.
Is it a blog consolidation opportunity - pls check with wilson and Ahmed.

Wilsons blog will only Focus on NAP. This section is to provide some answers to the immediate question of using any autoscaler with config scheduler. Wilson and i are in conversation about cross-referencing

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

kaarthis · 2026-03-27T17:56:06Z

Missing Section: Upstream Kubernetes Work & AKS Roadmap - you can point to KEPs and work being done and planned too.

Revise blog content for clarity and conciseness regarding Configurable Scheduler Profiles on AKS, emphasizing benefits and operational details.

Corrected formatting and improved clarity of scheduler profile monitoring metrics.

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 8 comments.

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 2 comments.

Copilot · 2026-03-27T21:14:53Z

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

+This blog explains how the default Kubernetes scheduler places pods, where the defaults fall short, and how to increase node utilization using Configurable Scheduler Profiles on AKS.
+
+1. [How does kube-scheduler work?](#how-does-the-default-kubernetes-scheduler-place-pods)
+2. [Use Configurable Scheduler Profiles to ncrease node utilization and operator control](#configurable-scheduler-profiles-on-aks)


Typo in the table of contents link text: "ncrease" should be "increase".

Suggested change

2. [Use Configurable Scheduler Profiles to ncrease node utilization and operator control](#configurable-scheduler-profiles-on-aks)

2. [Use Configurable Scheduler Profiles to increase node utilization and operator control](#configurable-scheduler-profiles-on-aks)

Copilot · 2026-03-27T21:14:53Z

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md

+
+`MostAllocated` scores nodes based on their current resource utilization, favoring nodes that are already more heavily utilized. Unlike `RequestedToCapacityRatio`, it does not consider node capacity in node scoring, making it more suitable for an aggressive cost-optimization scheduling strategy. When paired with MostAllocated, `NodeResourcesBalancedAllocation` complements the behavior because it encourages pod placement on nodes with user-defined proportional utilization, helping reduce bottlenecks caused by asymmetric resource pressure.
+
+When combined, these plugins favor GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This results in more efficient GPU placement and fewer partially utilized nodes.[Configure node bin-packing][configure-most-allocated] using the MostAllocated strategy to improve utilization and reduce infrastructure costs.


Missing space after the period before the "Configure node bin-packing" link, which will read as a single word/sentence fragment when rendered.

Suggested change

When combined, these plugins favor GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This results in more efficient GPU placement and fewer partially utilized nodes.[Configure node bin-packing][configure-most-allocated] using the MostAllocated strategy to improve utilization and reduce infrastructure costs.

When combined, these plugins favor GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This results in more efficient GPU placement and fewer partially utilized nodes. [Configure node bin-packing][configure-most-allocated] using the MostAllocated strategy to improve utilization and reduce infrastructure costs.

colinmixonn added 9 commits December 10, 2025 15:33

Add Scheduler tag to blog tags configuration

8103dda

Added a new tag for Scheduler with relevant details.

Update index.md

6c035be

Fix typos in AKS configuration blog post

4d7a3cc

Revise AKS Configurable Scheduler Profiles blog post

0b3b6b9

Updated blog post on AKS Configurable Scheduler Profiles to improve clarity and correctness, including sections on GPU utilization, pod distribution, and memory-optimized scheduling.

Fix typos and enhance clarity in AKS blog post

4fd375d

Corrected typos and improved clarity in the blog post about AKS Configurable Scheduler Profiles.

Fix links and typos in AKS Configurable Scheduler blog

5d34e50

Clarify objectives and improve section titles in blog

0760340

Updated the blog to clarify the objectives of configuring AKS Configurable Scheduler Profiles, improved section titles, and ensured consistency in terminology.

Enhance clarity in AKS Configurable Scheduler blog

92ff663

Clarified the objectives and improved the wording in the blog post about AKS Configurable Scheduler Profiles.

colinmixonn marked this pull request as ready for review December 11, 2025 17:52

colinmixonn requested review from a team, circy9, Copilot, qpetraroia and seanmck and removed request for Copilot December 11, 2025 17:52

colinmixonn requested a review from palma21 as a code owner December 11, 2025 17:52

Update index.md

ef5c000

Copilot AI review requested due to automatic review settings December 11, 2025 18:00

Copilot started reviewing on behalf of colinmixonn December 11, 2025 18:01 View session

dcasati reviewed Dec 11, 2025

View reviewed changes

website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Fei-Guo reviewed Dec 11, 2025

View reviewed changes

website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md Show resolved Hide resolved

colinmixonn and others added 6 commits December 11, 2025 11:14

Update website/blog/2025-12-16-aks-config-scheduler-profiles-preview/…

a488c12

…index.md Co-authored-by: Diego Casati <diego.casati@gmail.com>

Update website/blog/2025-12-16-aks-config-scheduler-profiles-preview/…

3999b23

…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update website/blog/2025-12-16-aks-config-scheduler-profiles-preview/…

37ced5b

…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update website/blog/2025-12-16-aks-config-scheduler-profiles-preview/…

ef30f6e

…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update website/blog/2025-12-16-aks-config-scheduler-profiles-preview/…

4dd4ee0

…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update website/blog/tags.yml

0cf3a81

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 23, 2026

View reviewed changes

colinmixonn and others added 2 commits March 23, 2026 14:54

Update website/blog/2026-03-31-aks-config-scheduler-profiles-preview/…

6f31bb6

…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestions from code review

21490f9

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 23, 2026

View reviewed changes

colinmixonn added 2 commits March 23, 2026 15:35

Revise scheduler profiles and scoring strategy details

274c45c

Updated the explanation of the Kubernetes scheduler's filtering and scoring phases, added a diagram, and clarified the benefits of the RequestedToCapacityRatio scoring strategy for CPU utilization.

Refine blog content on AKS Configurable Scheduler Profiles

af2fb78

Revised text for clarity and consistency regarding Kubernetes cluster CPU utilization and Configurable Scheduler Profiles. Updated YAML examples and added notes for resource adjustments.

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Apply suggestions from code review

e747ad8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 23, 2026

View reviewed changes

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 24, 2026

View reviewed changes

colinmixonn and others added 2 commits March 23, 2026 17:46

Enhance clarity on AKS scheduling strategies

06f870b

Clarify the explanation of the 'RequestedToCapacityRatio' and 'MostAllocated' scheduling strategies for AKS, emphasizing their impact on CPU and GPU utilization.

Apply suggestions from code review

3ff5fda

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 24, 2026

View reviewed changes

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

Apply suggestions from code review

d4863ac

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 24, 2026

View reviewed changes

sabbour reviewed Mar 24, 2026

View reviewed changes

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

sabbour reviewed Mar 24, 2026

View reviewed changes

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

sabbour approved these changes Mar 24, 2026

View reviewed changes

Apply suggestions from code review

440a7ea

Co-authored-by: Ahmed Sabbour <103856+sabbour@users.noreply.github.com>

Copilot AI reviewed Mar 25, 2026

View reviewed changes

website/blog/2026-03-31-aks-config-scheduler-profiles-preview/index.md Outdated Show resolved Hide resolved

kaarthis reviewed Mar 27, 2026

View reviewed changes

colinmixonn added 2 commits March 27, 2026 12:10

Update blog on Configurable Scheduler Profiles for AKS

d3674ee

Revise blog content for clarity and conciseness regarding Configurable Scheduler Profiles on AKS, emphasizing benefits and operational details.

Fix formatting and enhance scheduler profile metrics section

8a9ecb8

Corrected formatting and improved clarity of scheduler profile monitoring metrics.

Copilot AI reviewed Mar 27, 2026

View reviewed changes

colinmixonn added 2 commits March 27, 2026 13:34

Fix links and improve clarity in blog post

a5d0ca3

Update blog post links for clarity and accuracy

f1320ab

Copilot AI reviewed Mar 27, 2026

View reviewed changes


		The Kubernetes scheduler operates in two cycles: a synchronous scheduling cycle and an asynchronous binding cycle. The scheduling cycle has two sub-phases, filtering and scoring, and only manages one pod at a time.

		1. Filtering phase removes unsuitable nodes based on hard and soft constraints.

	2. [Use Configurable Scheduler Profiles to ncrease node utilization and operator control](#configurable-scheduler-profiles-on-aks)
	2. [Use Configurable Scheduler Profiles to increase node utilization and operator control](#configurable-scheduler-profiles-on-aks)


		`MostAllocated` scores nodes based on their current resource utilization, favoring nodes that are already more heavily utilized. Unlike `RequestedToCapacityRatio`, it does not consider node capacity in node scoring, making it more suitable for an aggressive cost-optimization scheduling strategy. When paired with MostAllocated, `NodeResourcesBalancedAllocation` complements the behavior because it encourages pod placement on nodes with user-defined proportional utilization, helping reduce bottlenecks caused by asymmetric resource pressure.

		When combined, these plugins favor GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This results in more efficient GPU placement and fewer partially utilized nodes.[Configure node bin-packing][configure-most-allocated] using the MostAllocated strategy to improve utilization and reduce infrastructure costs.

Conversation

colinmixonn commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

colinmixonn commented Dec 10, 2025 •

edited

Loading

colinmixonn Mar 27, 2026 •

edited

Loading