Skip to content

Nap provisioning best practice blog#5665

Open
wdarko1 wants to merge 25 commits intoAzure:masterfrom
wdarko1:nap-provisioning-best-practice-blog
Open

Nap provisioning best practice blog#5665
wdarko1 wants to merge 25 commits intoAzure:masterfrom
wdarko1:nap-provisioning-best-practice-blog

Conversation

@wdarko1
Copy link
Copy Markdown
Contributor

@wdarko1 wdarko1 commented Mar 18, 2026

Adds a best practices blog for managing node selection, availability.

wdarko1 added 3 commits March 17, 2026 22:50
Add a new blog post on controlling node provisioning outcomes in AKS, covering PDBs, affinity, and topology spread constraints.
Updated language for clarity and precision in Kubernetes provisioning guidance. Enhanced explanations of key concepts and best practices for AKS Node Auto-Provisioning.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new AKS blog post describing best practices for influencing node provisioning outcomes (PDBs, affinity/anti-affinity, and topology spread constraints) and how Node Auto-Provisioning (NAP) interprets those signals.

Changes:

  • Adds a new blog post markdown file for NAP/node provisioning best practices.
  • Includes examples and guidance for topology spread constraints, affinity, and Pod Disruption Budgets.
  • Describes NAP behavior for node selection, disruption, and topology spread.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 19, 2026 03:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new AKS blog post describing best practices for controlling node provisioning outcomes (PDBs, affinity/anti-affinity, topology spread constraints) and how AKS Node Auto-Provisioning (NAP) interprets those signals.

Changes:

  • Introduces a new best-practices post covering scheduling intent primitives (topology spread, affinity, PDBs).
  • Adds guidance on how NAP uses these constraints for node selection, scaling, and disruption/consolidation behaviors.

wdarko1 added 2 commits March 24, 2026 08:16
Updated article to refine title and description, adjust publication date, and enhance clarity on Node Auto-Provisioning (NAP) concepts and best practices.
Expanded on the benefits of Node Auto-Provisioning for compute efficiency and added a section on next steps for users to get started.
Copilot AI review requested due to automatic review settings March 25, 2026 18:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new AKS blog post describing best practices for shaping Node Auto-Provisioning (NAP) outcomes via PDBs, affinity/anti-affinity, and topology spread constraints.

Changes:

  • Adds a long-form best-practices article covering scheduling intent vs. node policy boundaries for AKS NAP.
  • Provides example manifests and operational guidance for zonal spreading, node affinity, and disruption controls.

Copilot AI review requested due to automatic review settings March 25, 2026 19:12
@wdarko1 wdarko1 requested a review from colinmixonn March 25, 2026 19:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 5 comments.

wdarko1 and others added 2 commits March 25, 2026 12:52
Added a new tag for 'Scheduler' with relevant details.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 20:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

wdarko1 and others added 2 commits March 25, 2026 13:08
Updated text for clarity and consistency throughout the document, including image references and examples.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 20:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

wdarko1 and others added 2 commits March 25, 2026 13:19
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 20:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

Comment on lines +166 to +176
```yaml
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values: ["Standard_D16ds_v5"]
```
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution YAML example isn’t valid as written: the list item (- weight: 100) should be indented under preferredDuringSchedulingIgnoredDuringExecution, and the matchExpressions list should also be properly indented. Please fix indentation so readers can copy/paste it.

Copilot uses AI. Check for mistakes.
wdarko1 and others added 2 commits March 25, 2026 13:26
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 20:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

---
title: "Controlling Node Provisioning Outcomes on AKS: PDBs, Affinity, and Topology Spread"
description: "Learn AKS best practices for Node Auto-Provisioning, using PDBs, affinity, and topology spread constraints to achieve predictable, resilient pod scheduling."
date: 2026-03-30
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The front matter date: 2026-03-30 doesn't match the post folder prefix 2026-03-20-..., but the repo authoring guide requires them to match (controls chronology). Please update either the folder name or the front matter date so they align. Also note the date is in the future relative to today (2026-03-25); if the intent is to prevent early publishing, consider using draft: true/unlisted: true or keeping the post off the main branch until the publish date (depending on site config).

Suggested change
date: 2026-03-30
date: 2026-03-20

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@kaarthis kaarthis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments but critical recommendation : PDB is about pod criticality wrt disruption/drain — not scheduling intent. It doesn't belong with nodeSelector, taints, and topologyRecommendation : Split the blog. Keep Part 1-3 (Topology, Affinity, Scheduling Intent) as the scheduling blog. Move Part 4 (PDB + NAP disruption) to a separate "Controlling Disruption with NAP: PDBs, Consolidation Policies, and Node Disruption Budgets" blog. Cross-link between them.

- How do I express node preferences without accidentally blocking scheduling?
- If I’m using Node Auto-Provisioning (NAP), how does it interpret the rules I set?

This post will connect NAP with the three most important workload-level tools for shaping predictable node provisioning outcomes on AKS:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What data backs "three most important"? Support cases? Community signal? Or judgment call?

- If I’m using Node Auto-Provisioning (NAP), how does it interpret the rules I set?

This post will connect NAP with the three most important workload-level tools for shaping predictable node provisioning outcomes on AKS:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this blog about scheduling or lifecycle? Title = provisioning, Part 4 = eviction. Split?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the blog post to focus on scheduling, and split out the disruption topic into a separate blog post

matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values: ["Standard_D16ds_v5"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight: 100 is unexplained — what's the scale, how does it interact with multiple preferences?


**Practical guidance:**

- For critical workloads that you do not want to be disrupted at all, strictness of "zero eviction" may be intentional — but be deliberate. When you're ready to allow disruption to these workloads, you may have to change the PDBs in the workload deployment file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxUnavailable: 0 — this blocks security patching. Reframe as anti-pattern. We should never advocate for Zero eviction anywhere in doc.

- How do I express node preferences without accidentally blocking scheduling?
- If I’m using Node Auto-Provisioning (NAP), how does it interpret the rules I set?

This post will connect NAP with the three most important workload-level tools for shaping predictable node provisioning outcomes on AKS:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are taints/tolerations not in the top 3 for NAP? They're arguably more scheduling-relevant than PDBs.


Kubernetes describes minAvailable / maxUnavailable as the two key availability knobs, and notes you can only specify one per PDB.

### How NAP handles disruption
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does NAP's consolidation engine interact with PDBs specifically? One line isn't enough.

labels:
app: web
spec:
topologySpreadConstraints:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when topology spread and affinity conflict? ( i dont see this combination in the doc. )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a matrix describing this behavior with recommendations for users

What these fields mean (in plain language):

- topologyKey: topology.kubernetes.io/zone → spread across zones (not just nodes).
- maxSkew: 1 → keep zone counts close (difference between most/least loaded domains can’t exceed 1 when DoNotSchedule).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link "A good default" section to NAP NodePool zone configuration.


## Background

AKS users want to ensure their workloads schedule, scale, and are disrupted only when (or where) desired. The problem here is Kubernetes can feel complex, and its easy to be unclear what settings to use to accomplish this. Node Auto-Provisioning allows amazing benefits for compute efficiency, but to best utilize it - users need to make sure certain best practices are followed for predictable behavior.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace "amazing benefits" with specific efficiency claims.

Behavior: NAP consolidates too often or voluntarily disrupts too many nodes at once
Cause: User has not set any guardrails on node disruption behavior.

- Fix: Add PDBs that regulate disruption pace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PDB is about pod criticality wrt disruption/drain — not scheduling intent. It doesn't belong with nodeSelector, taints, and topologyRecommendation : Split the blog. Keep Part 1-3 (Topology, Affinity, Scheduling Intent) as the scheduling blog. Move Part 4 (PDB + NAP disruption) to a separate "Controlling Disruption with NAP: PDBs, Consolidation Policies, and Node Disruption Budgets" blog. Cross-link between them.

Updated the blog post to replace references to Pod Disruption Budgets (PDBs) with taints and tolerations, providing a more accurate description of their role in controlling node provisioning outcomes. The disruption topics (PDBs, consolidation) will be moved to another blog post.
Copilot AI review requested due to automatic review settings March 27, 2026 20:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.

Comment on lines +211 to +213
taints:
- key: test.com/custom-taint
effect: NoSchedule
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NodePool taints example doesn't match the text above it (which references spec.template.spec.taints). As written, the YAML is missing the spec: under template: and the indentation suggests taints is directly under template, which is not the correct NodePool schema path. Please update the example to use the correct field path and structure.

Suggested change
taints:
- key: test.com/custom-taint
effect: NoSchedule
spec:
taints:
- key: test.com/custom-taint
effect: NoSchedule

Copilot uses AI. Check for mistakes.
effect: NoSchedule
```

> ![NOTE] Taints can prevent pods from being scheduled to these nodes if they are not tolerated by the pods. A proper toleration must be added to your specific pods to allow them to be scheduled to nodes that are based on this NodePool CRD.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line uses Markdown image syntax (![NOTE]) without a URL, which will render as a broken image rather than a note. Use a Docusaurus admonition (for example :::note) or a blockquote note format instead.

Suggested change
> ![NOTE] Taints can prevent pods from being scheduled to these nodes if they are not tolerated by the pods. A proper toleration must be added to your specific pods to allow them to be scheduled to nodes that are based on this NodePool CRD.
> **Note**: Taints can prevent pods from being scheduled to these nodes if they are not tolerated by the pods. A proper toleration must be added to your specific pods to allow them to be scheduled to nodes that are based on this NodePool CRD.

Copilot uses AI. Check for mistakes.
Tolerations are a field you place in your workload deployment file to flag what types of tainted nodes these pods can be scheduled to. There are two general behaviors for tolerations:

- `NoSchedule` - strict toleration. Only pods with the proper toleration can be scheduled to the node with a specific taint.
- `PreferNoSchedule` - less strict toleration. AKS will _try_ to avoid placing pods that don't tolerate this node's taint, but it's not gauranteed.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "gauranteed" should be "guaranteed".

Suggested change
- `PreferNoSchedule` - less strict toleration. AKS will _try_ to avoid placing pods that don't tolerate this node's taint, but it's not gauranteed.
- `PreferNoSchedule` - less strict toleration. AKS will _try_ to avoid placing pods that don't tolerate this node's taint, but it's not guaranteed.

Copilot uses AI. Check for mistakes.
---
title: "Controlling Node Provisioning Outcomes on AKS: PDBs, Affinity, and Topology Spread"
description: "Learn AKS best practices for Node Auto-Provisioning, using taints and tolerations, affinity, and topology spread constraints to achieve predictable, resilient pod scheduling."
date: 2026-03-30
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post folder is 2026-03-20-... but the front matter date: is 2026-03-30. The website authoring guide requires the folder date prefix to match the front matter date; please align them by renaming the folder or updating the date: value (and keep the displayed publish date consistent with what you want readers to see).

Suggested change
date: 2026-03-30
date: 2026-03-20

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +47
Node auto-provisioning provisions, scales, and manages nodes. NAP senses pending pod pressure, chooses/provisions nodes that satisfy workload specs and NodePool allowed options — and then schedules pods onto those nodes.

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description of NAP suggests it "schedules pods onto those nodes". NAP provisions/manages nodes in response to pending pods, but the Kubernetes scheduler is still what schedules pods; consider rewording to avoid implying NAP is the scheduler.

Copilot uses AI. Check for mistakes.
wdarko1 added 2 commits March 27, 2026 20:54
Updated the content for clarity and added links to relevant documentation. Improved explanations of Node Auto-Provisioning, Topology Spread Constraints, and Taints and Tolerations.
Updated content for clarity and added FAQ section.
Copilot AI review requested due to automatic review settings March 28, 2026 04:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 9 comments.

scheduler:
label: Scheduler
permalink: /scheduler
description: Scheduling workloads and managing scheduling constraints on AKS.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduler tag entry is missing a blank line separator before the next top-level tag (security:), which breaks the consistent spacing pattern used throughout tags.yml and makes the file harder to scan. Add a blank line after the scheduler tag block.

Suggested change
description: Scheduling workloads and managing scheduling constraints on AKS.
description: Scheduling workloads and managing scheduling constraints on AKS.

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +30
title: "Controlling Node Provisioning Outcomes on AKS: PDBs, Affinity, and Topology Spread"
description: "Learn AKS best practices for Node Auto-Provisioning, using taints and tolerations, affinity, and topology spread constraints to achieve predictable, resilient pod scheduling."
date: 2026-03-30
authors: ["wilson-darko"]
tags:
- node-auto-provisioning
- scheduler
---

## Background

AKS users want to ensure their workloads schedule, scale, and are disrupted only when (or where) desired. The problem here is Kubernetes can feel complex, and its easy to be unclear what settings to use to accomplish this. Node Auto-Provisioning optimizes bin-packing your compute, but to best utilize it - users need to make sure certain best practices are followed for predictable behavior.

When adopting Kubernetes at scale, the hardest operational questions often aren’t “How do I scale nodes (or VMs)?” — they’re:

- Where will my workload replicas land (zones / nodes)?
- How do I express node preferences without accidentally blocking scheduling?
- If I’m using Node Auto-Provisioning (NAP), how does it interpret the rules I set?

This post will connect NAP with three most important workload-level tools for shaping predictable node provisioning outcomes on AKS:

1. **Taints and Tolerations** – control which pods can go to which nodes
2. **Affinity/Anti-Affinity** – control where workloads can (or should not) run
3. **Topology Spread Constraints** – control replica distribution across failure domains

Then we’ll connect the dots to explain what AKS Node Auto-Provisioning (NAP) does with those signals to manage your workloads.

If you’re new to these Kubernetes features, this post will give you “good defaults” as a starting point. If you’re already deep into scheduling, treat it as a checklist for the behaviors AKS users most commonly ask about.

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title references “PDBs”, but the article body doesn't discuss PodDisruptionBudgets anywhere. Either add a section explaining how PDBs interact with NAP/scheduling/disruptions, or remove PDBs from the title to avoid misleading readers.

Copilot uses AI. Check for mistakes.

## Part 2 — Topology Spread Constraints: tool for zone-aware replicas

**Topology Spread Constraints** let you tell the scheduler: “Keep these replicas balanced across domains like zones or nodes.” The Kubernetes documentation describe it as a way to spread pods across failure domains such as regions, zones, nodes, and custom topology keys.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: “The Kubernetes documentation describe it” should be “The Kubernetes documentation describes it”.

Suggested change
**Topology Spread Constraints** let you tell the scheduler: “Keep these replicas balanced across domains like zones or nodes.” The Kubernetes documentation describe it as a way to spread pods across failure domains such as regions, zones, nodes, and custom topology keys.
**Topology Spread Constraints** let you tell the scheduler: “Keep these replicas balanced across domains like zones or nodes.” The Kubernetes documentation describes them as a way to spread pods across failure domains such as regions, zones, nodes, and custom topology keys.

Copilot uses AI. Check for mistakes.
effect: NoSchedule
```

> ![NOTE] Taints can prevent pods from being scheduled to these nodes if they are not tolerated by the pods. A proper toleration must be added to your specific pods to allow them to be scheduled to nodes that are based on this NodePool CRD.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callout uses > ![NOTE], which isn't a supported admonition/alert syntax in this repo (other posts use :::note or > [!NOTE]). Update this block to a supported format so it renders correctly.

Suggested change
> ![NOTE] Taints can prevent pods from being scheduled to these nodes if they are not tolerated by the pods. A proper toleration must be added to your specific pods to allow them to be scheduled to nodes that are based on this NodePool CRD.
> [!NOTE]
> Taints can prevent pods from being scheduled to these nodes if they are not tolerated by the pods. A proper toleration must be added to your specific pods to allow them to be scheduled to nodes that are based on this NodePool CRD.

Copilot uses AI. Check for mistakes.
Comment on lines +299 to +301
You can consider enabling features such as [Artifact Stream](https://learn.microsoft.com/en-us/azure/aks/artifact-streaming) which can decrease pod readiness time.

For more visit our documentation on [performance and scaling best practices](https://learn.microsoft.com/en-us/azure/aks/best-practices-performance-scale).
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Learn links use locale-specific URLs (/en-us/). Repo guidance prefers generic Learn links without locale segments to avoid redirects and keep links consistent. Update them to https://learn.microsoft.com/azure/....

Suggested change
You can consider enabling features such as [Artifact Stream](https://learn.microsoft.com/en-us/azure/aks/artifact-streaming) which can decrease pod readiness time.
For more visit our documentation on [performance and scaling best practices](https://learn.microsoft.com/en-us/azure/aks/best-practices-performance-scale).
You can consider enabling features such as [Artifact Stream](https://learn.microsoft.com/azure/aks/artifact-streaming) which can decrease pod readiness time.
For more visit our documentation on [performance and scaling best practices](https://learn.microsoft.com/azure/aks/best-practices-performance-scale).

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +172
preference:
matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values: ["Standard_D16ds_v5"]
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferredDuringSchedulingIgnoredDuringExecution node affinity example is not valid Kubernetes syntax: this field must be a list of terms and each term requires a weight and a preference. As written, readers copying this will get schema/validation errors.

Suggested change
preference:
matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values: ["Standard_D16ds_v5"]
- weight: 100
preference:
matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values:
- Standard_D16ds_v5

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +213
| Topology Spread Configuration | Affinity Configuration | Observed Scheduling Behavior | Recommendation |
|------------------------------|------------------------|------------------------------|----------------|
| **Hard** (`whenUnsatisfiable: DoNotSchedule`) | **Hard Node Affinity** (`requiredDuringSchedulingIgnoredDuringExecution`) | Pod remains **Pending** if no node satisfies *both* constraints. The scheduler filters out all nodes that violate either rule. | Use only when you are certain the constraints are always compatible (for example, multi‑zone node affinity plus multi‑zone spread). Avoid mixing single‑zone affinity with multi‑zone spread. |
| **Soft** (`whenUnsatisfiable: ScheduleAnyway`) | **Hard Node Affinity** (`requiredDuringSchedulingIgnoredDuringExecution`) | Pod schedules only on nodes matching affinity. Topology spread is applied as **best‑effort**, and distribution may be uneven. | ✅ **Recommended default** for most workloads. Enforce strict placement requirements while keeping high availability best‑effort. |
| **Hard** (`whenUnsatisfiable: DoNotSchedule`) | **Soft Node Affinity** (`preferredDuringSchedulingIgnoredDuringExecution`) | Pod schedules only if topology spread constraints are met. Affinity acts only as a preference among valid nodes. | Use when even distribution across zones or nodes is more important than node‑level preferences. |
| **Soft** (`whenUnsatisfiable: ScheduleAnyway`) | **Soft Node Affinity** | Pod always schedules. Both constraints only influence scoring; placement is flexible and may be imbalanced. | Suitable for dev/test, batch, or low‑criticality workloads. |
| **Hard multi‑zone spread** (`whenUnsatisfiable: DoNotSchedule` and `minDomains` >= 2) | **Single‑zone hard affinity** | Pod enters a permanent **Pending** state due to a logical contradiction between constraints. | Align affinity and spread to the same topology domains, or relax one of the constraints. |

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown table under “The following table lists…” is written with leading || on each row, which creates an empty first column and typically renders incorrectly. Rewrite it using standard markdown table syntax with a single leading | per row.

Copilot uses AI. Check for mistakes.
Comment on lines +241 to +253
In the following example shows a taint called `test.com/custom-taint` that is added in the `spec.template.spec.taints` field in a [NodePool CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools):

```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
taints:
- key: test.com/custom-taint
effect: NoSchedule
```
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NodePool taints YAML snippet doesn't match the field path described in the text (spec.template.spec.taints). The example is missing the spec: level under template and the indentation suggests taints is directly under template, which is inconsistent and likely to confuse readers.

Copilot uses AI. Check for mistakes.
Tolerations are a field you place in your workload deployment file to flag what types of tainted nodes these pods can be scheduled to. There are two general behaviors for tolerations:

- `NoSchedule` - strict toleration. Only pods with the proper toleration can be scheduled to the node with a specific taint.
- `PreferNoSchedule` - less strict toleration. AKS will _try_ to avoid placing pods that don't tolerate this node's taint, but it's not gauranteed.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: “gauranteed” should be “guaranteed”.

Suggested change
- `PreferNoSchedule` - less strict toleration. AKS will _try_ to avoid placing pods that don't tolerate this node's taint, but it's not gauranteed.
- `PreferNoSchedule` - less strict toleration. AKS will _try_ to avoid placing pods that don't tolerate this node's taint, but it's not guaranteed.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants