Add support to Admission Fair Sharing API#1685
Add support to Admission Fair Sharing API#1685MaysaMacedo wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdded an optional admissionFairSharing configuration to CRD schemas and manifests, introduced corresponding Go types and deepcopy methods, and wired parsing/validation of string resourceWeights into configmap generation with an added buildAdmissionFairSharing helper and updated tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaysaMacedo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
can you run I still haven't got it working in the CI yet. |
06724e0 to
27756c7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/configmap/configmap.go (1)
228-228: Remove unnecessary type conversion.The key
kis already of typecorev1.ResourceNamefrom iterating the map, so the conversion is redundant.♻️ Proposed fix
- resourceWeights[corev1.ResourceName(k)] = f + resourceWeights[k] = f🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/configmap/configmap.go` at line 228, Remove the redundant conversion of k to corev1.ResourceName when assigning into resourceWeights: in the loop where k is already a corev1.ResourceName, replace the expression resourceWeights[corev1.ResourceName(k)] = f with a direct assignment resourceWeights[k] = f so the key is used without unnecessary type casting (refer to the resourceWeights map and the loop variable k).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@upstream/kueue/e2e-test-ocp.sh`:
- Around line 55-63: The case statement has an unreachable branch and an extra
terminator: move the customconfigs) case to before the catch-all (*) pattern so
it can be matched, and remove the duplicate trailing esac so there is only one
closing esac for the case block; i.e., reorder the patterns so customconfigs)
appears above *) and delete the extra esac.
---
Nitpick comments:
In `@pkg/configmap/configmap.go`:
- Line 228: Remove the redundant conversion of k to corev1.ResourceName when
assigning into resourceWeights: in the loop where k is already a
corev1.ResourceName, replace the expression
resourceWeights[corev1.ResourceName(k)] = f with a direct assignment
resourceWeights[k] = f so the key is used without unnecessary type casting
(refer to the resourceWeights map and the loop variable k).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 61ff351f-f69f-45fe-ae77-4728b6cc0194
⛔ Files ignored due to path filters (3)
pkg/generated/applyconfiguration/kueueoperator/v1/admissionfairsharing.gois excluded by!**/generated/**pkg/generated/applyconfiguration/kueueoperator/v1/kueueconfiguration.gois excluded by!**/generated/**pkg/generated/applyconfiguration/utils.gois excluded by!**/generated/**
📒 Files selected for processing (7)
deploy/crd/kueue-operator.crd.yamlmanifests/kueue.openshift.io_kueues.yamlpkg/apis/kueueoperator/v1/types.gopkg/apis/kueueoperator/v1/zz_generated.deepcopy.gopkg/configmap/configmap.gopkg/configmap/configmap_test.goupstream/kueue/e2e-test-ocp.sh
58a91d4 to
09df846
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/apis/kueueoperator/v1/types.go`:
- Around line 153-158: Add a kubebuilder validation on the ResourceWeights map
so the CRD only accepts finite numeric strings: annotate the ResourceWeights
field in pkg/apis/kueueoperator/v1/types.go (the ResourceWeights
map[corev1.ResourceName]string) with kubebuilder tags to enforce string type and
a regex pattern that matches finite decimals/integers (e.g.
Pattern="^[-+]?(?:\\d+(?:\\.\\d+)?|\\.\\d+)$"), keep +optional, and regenerate
the CRD so values like "foo", "NaN" or "Inf" are rejected at the API boundary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fb6bd3ff-d8d8-47f8-a50d-8c3db203e381
⛔ Files ignored due to path filters (3)
pkg/generated/applyconfiguration/kueueoperator/v1/admissionfairsharing.gois excluded by!**/generated/**pkg/generated/applyconfiguration/kueueoperator/v1/kueueconfiguration.gois excluded by!**/generated/**pkg/generated/applyconfiguration/utils.gois excluded by!**/generated/**
📒 Files selected for processing (8)
Makefiledeploy/crd/kueue-operator.crd.yamlmanifests/kueue.openshift.io_kueues.yamlpkg/apis/kueueoperator/v1/types.gopkg/apis/kueueoperator/v1/zz_generated.deepcopy.gopkg/configmap/configmap.gopkg/configmap/configmap_test.goupstream/kueue/e2e-test-ocp.sh
✅ Files skipped from review due to trivial changes (1)
- manifests/kueue.openshift.io_kueues.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/apis/kueueoperator/v1/zz_generated.deepcopy.go
- pkg/configmap/configmap_test.go
|
It would be good to get api-review on these changes. |
pkg/configmap/configmap.go
Outdated
| for k, v := range admissionFairSharing.ResourceWeights { | ||
| f, err := strconv.ParseFloat(v, 64) | ||
| if err != nil { | ||
| return nil, err |
There was a problem hiding this comment.
custom error here might be useful, and aggregating a slice of errors would be smart as well.
There was a problem hiding this comment.
errors.Join() is handy function to help with this.
09df846 to
b2a9e2a
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/apis/kueueoperator/v1/types.go (1)
153-158:⚠️ Potential issue | 🟠 MajorStill missing API validation for
resourceWeightsvalues.
ResourceWeightsstill accepts arbitrary strings, so typos are only caught later during reconciliation, andNaN/Infcan get pastParseFloatuntil config-map generation. Add a CEL rule here so the CRD only accepts finite, non-negative numeric strings, then regenerate the CRDs.Proposed fix
// resourceWeights assigns weights to resources which then are used to calculate LocalQueue's // resource usage and order Workloads. // when omitted, kueue will decide the default, which is subject to change over time. // The current default is 1 for any resource. // +optional + // +kubebuilder:validation:XValidation:rule="self.all(k, self[k].matches(r'^[+]?(\\d+(\\.\\d*)?|\\.\\d+)([eE][+-]?\\d+)?$'))",message="resourceWeights values must be non-negative numeric strings (for example: \"1\", \"0.5\", \"1e-3\")" ResourceWeights map[corev1.ResourceName]string `json:"resourceWeights,omitempty"`Expected result: the current tree should show no validation annotation next to
ResourceWeights, and the generated CRD should still exposeresourceWeightsas plaintype: stringvalues.#!/bin/bash set -euo pipefail echo "Operator API field:" rg -n -C4 'ResourceWeights map\[corev1\.ResourceName\]string' pkg/apis/kueueoperator/v1/types.go echo echo "Generated CRD block:" rg -n -C4 'resourceWeights:' deploy/crd/kueue-operator.crd.yaml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/apis/kueueoperator/v1/types.go` around lines 153 - 158, Add CEL validation for the ResourceWeights field so the CRD only accepts finite, non-negative numeric strings: update the struct tag on ResourceWeights in pkg/apis/kueueoperator/v1/types.go (the ResourceWeights map[corev1.ResourceName]string field) to include a validation annotation that enforces each value parses to a finite float and is >= 0 (e.g., a CEL rule referencing the map values), then regenerate the CRDs so deploy/crd/kueue-operator.crd.yaml contains the new validation constraint while still representing each map value as type: string; ensure the CEL expression rejects NaN/Inf and negative numbers and run the CRD generation step to commit the updated YAML.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/apis/kueueoperator/v1/types.go`:
- Around line 153-158: Add CEL validation for the ResourceWeights field so the
CRD only accepts finite, non-negative numeric strings: update the struct tag on
ResourceWeights in pkg/apis/kueueoperator/v1/types.go (the ResourceWeights
map[corev1.ResourceName]string field) to include a validation annotation that
enforces each value parses to a finite float and is >= 0 (e.g., a CEL rule
referencing the map values), then regenerate the CRDs so
deploy/crd/kueue-operator.crd.yaml contains the new validation constraint while
still representing each map value as type: string; ensure the CEL expression
rejects NaN/Inf and negative numbers and run the CRD generation step to commit
the updated YAML.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc5268f2-642c-4abc-a289-6db261b09a8b
⛔ Files ignored due to path filters (3)
pkg/generated/applyconfiguration/kueueoperator/v1/admissionfairsharing.gois excluded by!**/generated/**pkg/generated/applyconfiguration/kueueoperator/v1/kueueconfiguration.gois excluded by!**/generated/**pkg/generated/applyconfiguration/utils.gois excluded by!**/generated/**
📒 Files selected for processing (6)
deploy/crd/kueue-operator.crd.yamlmanifests/kueue.openshift.io_kueues.yamlpkg/apis/kueueoperator/v1/types.gopkg/apis/kueueoperator/v1/zz_generated.deepcopy.gopkg/configmap/configmap.gopkg/configmap/configmap_test.go
✅ Files skipped from review due to trivial changes (1)
- manifests/kueue.openshift.io_kueues.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/apis/kueueoperator/v1/zz_generated.deepcopy.go
- pkg/configmap/configmap_test.go
b2a9e2a to
df92841
Compare
deploy/crd/kueue-operator.crd.yaml
Outdated
| description: |- | ||
| ResourceWeight is a string representation of a non-negative float64 value | ||
| used as a weight for a resource in fair sharing calculations (e.g., "1.0", "0.5"). | ||
| maxLength: 32 |
There was a problem hiding this comment.
Note for reviewers:
I chose this value based on Claude's recommendation:
A float64 has at most ~17 significant digits, so a decimal string like "12345678901234567.12345678901234567" fits well within 32 characters.
deploy/crd/kueue-operator.crd.yaml
Outdated
| resource usage and order Workloads. | ||
| when omitted, kueue will decide the default, which is subject to change over time. | ||
| The current default is 1 for any resource. | ||
| maxProperties: 16 |
There was a problem hiding this comment.
Note for reviewers:
Picked this value based on Claude's recommendation:
Even a cluster with a very diverse hardware fleet would rarely exceed 10-12 distinct resource types. A user would only add weights for resources they actually want to influence fair sharing ordering — typically just 2-3 (cpu and memory, maybe a GPU).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/apis/kueueoperator/v1/types.go`:
- Around line 140-151: The two optional duration fields UsageHalfLifeTime and
UsageSamplingInterval in the FairSharing types should be changed from
value-typed metav1.Duration to pointer types (*metav1.Duration) and their json
tags must include ",omitempty" so that omitted vs explicit zero is preserved;
update the struct field declarations for UsageHalfLifeTime and
UsageSamplingInterval in pkg/apis/kueueoperator/v1/types.go accordingly and then
audit downstream usage (notably pkg/configmap/configmap.go) to handle nil
pointers when assigning/reading these fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9d1992ee-e5fd-4033-8caf-016682ae663c
⛔ Files ignored due to path filters (3)
pkg/generated/applyconfiguration/kueueoperator/v1/admissionfairsharing.gois excluded by!**/generated/**pkg/generated/applyconfiguration/kueueoperator/v1/kueueconfiguration.gois excluded by!**/generated/**pkg/generated/applyconfiguration/utils.gois excluded by!**/generated/**
📒 Files selected for processing (6)
deploy/crd/kueue-operator.crd.yamlmanifests/kueue.openshift.io_kueues.yamlpkg/apis/kueueoperator/v1/types.gopkg/apis/kueueoperator/v1/zz_generated.deepcopy.gopkg/configmap/configmap.gopkg/configmap/configmap_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/configmap/configmap_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/configmap/configmap.go
df92841 to
ff868bb
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/configmap/configmap.go`:
- Around line 231-233: The current construction of AdmissionFairSharing in
configmap.go converts nil operator fields into zero durations for
UsageHalfLifeTime (and similarly UsageSamplingInterval), which changes
semantics; change the builder so that UsageHalfLifeTime is left nil/omitted when
the source pointer is nil instead of using time.Duration(0) — e.g., check
ptr.Deref(admissionFairSharing.UsageHalfLifeTimeSeconds, nil) and only set
AdmissionFairSharing.UsageHalfLifeTime when a non-nil value exists (do the same
conditionally for UsageSamplingInterval if you want to preserve downstream
defaults), leaving ResourceWeights assignment as-is; ensure the created config
object does not contain zero-valued Duration for UsageHalfLifeTime so Kueue
defaulting can apply.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cff334bf-9493-4173-bddf-e0cf8a45605d
⛔ Files ignored due to path filters (4)
pkg/generated/applyconfiguration/kueueoperator/v1/admissionfairsharing.gois excluded by!**/generated/**pkg/generated/applyconfiguration/kueueoperator/v1/kueueconfiguration.gois excluded by!**/generated/**pkg/generated/applyconfiguration/kueueoperator/v1/resourceweight.gois excluded by!**/generated/**pkg/generated/applyconfiguration/utils.gois excluded by!**/generated/**
📒 Files selected for processing (6)
deploy/crd/kueue-operator.crd.yamlmanifests/kueue.openshift.io_kueues.yamlpkg/apis/kueueoperator/v1/types.gopkg/apis/kueueoperator/v1/zz_generated.deepcopy.gopkg/configmap/configmap.gopkg/configmap/configmap_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/configmap/configmap_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- deploy/crd/kueue-operator.crd.yaml
- manifests/kueue.openshift.io_kueues.yaml
- pkg/apis/kueueoperator/v1/types.go
6a3c1e7 to
85fc45a
Compare
| // so that queues which have consumed fewer resources are admitted first. | ||
| // This section configures the usage decay rate, the sampling frequency, and per-resource weights. | ||
| // +optional | ||
| AdmissionFairSharing *AdmissionFairSharing `json:"admissionFairSharing,omitempty"` |
There was a problem hiding this comment.
The underlying type has no required fields, making admissionFairSharing: {} currently an accepted input.
What does it mean to set admissionFairSharing: {} as opposed to omitting the field altogether?
There was a problem hiding this comment.
It means the default values would be enforced. For example, when I set the admissionFairSharing: {} in the kueue configuration I get the following enforced in Kueue: "admissionFairSharing:\n usageHalfLifeTime: 0s\n usageSamplingInterval: 5m0s
pkg/apis/kueueoperator/v1/types.go
Outdated
| type AdmissionFairSharing struct { | ||
| // usageHalfLifeTimeSeconds indicates the time in seconds after which the current usage will decrease by a half. | ||
| // If set to 0, usage will be reset to 0 immediately. | ||
| // +kubebuilder:validation:Minimum=0 |
There was a problem hiding this comment.
Note: Generated documentation doesn't include these markers. We recommend adding explicit text to the GoDoc describing the constraints enforced by the markers so they show up in the generated documentation that end users see when doing a kubectl explain ....
pkg/apis/kueueoperator/v1/types.go
Outdated
| // If set to 0, usage will be reset to 0 immediately. | ||
| // +kubebuilder:validation:Minimum=0 | ||
| // +optional | ||
| UsageHalfLifeTimeSeconds *int32 `json:"usageHalfLifeTimeSeconds,omitempty"` |
There was a problem hiding this comment.
Is there a reasonable upper bound you can enforce here?
Leaving this wide open means that someone can set this to something like usageHalfLifeTimeSeconds: 2147483647, which is equivalent to ~68 years. Is that a reasonable value to set?
There was a problem hiding this comment.
I did hear something at KubeCon who was interested in setting really large Admission Fair sharing as a way to mimic budgets running out.
| // usageHalfLifeTimeSeconds indicates the time in seconds after which the current usage will decrease by a half. | ||
| // If set to 0, usage will be reset to 0 immediately. | ||
| // +kubebuilder:validation:Minimum=0 | ||
| // +optional |
There was a problem hiding this comment.
What happens if I don't set this field?
There was a problem hiding this comment.
The Kueue controller would load it as usageHalfLifeTime: 0s. This probably happens because the type is of Duration https://github.com/kubernetes-sigs/kueue/blob/main/apis/config/v1beta2/configuration_types.go#L601
pkg/apis/kueueoperator/v1/types.go
Outdated
| UsageHalfLifeTimeSeconds *int32 `json:"usageHalfLifeTimeSeconds,omitempty"` | ||
|
|
||
| // usageSamplingIntervalSeconds is the frequency in seconds that Kueue updates consumedResources in FairSharingStatus. | ||
| // When omitted or set to 0, kueue will decide the default, which is subject to change over time. |
There was a problem hiding this comment.
If omission and explicitly set to 0 mean the same thing, make 0 an invalid value. We strongly encourage only having a single way for a user to "say" the same thing to prevent confusion over what is the correct way to configure something.
By doing this, this field also no longer needs to be a pointer.
There was a problem hiding this comment.
ack, I also modified UsageHalfLifeTimeSeconds to follow the same pattern.
pkg/apis/kueueoperator/v1/types.go
Outdated
| // The current default is 300 (5 minutes). | ||
| // +kubebuilder:validation:Minimum=0 | ||
| // +optional | ||
| UsageSamplingIntervalSeconds *int32 `json:"usageSamplingIntervalSeconds,omitempty"` |
There was a problem hiding this comment.
Similar comment regarding a reasonable upper bounds.
| // +listMapKey=name | ||
| // +kubebuilder:validation:MaxItems=16 | ||
| // +optional | ||
| ResourceWeights []ResourceWeight `json:"resourceWeights,omitempty"` |
There was a problem hiding this comment.
Add a minimum items constraint of 1 to prevent unstructured clients from being able to set resourceWeights: [].
| // +listType=map | ||
| // +listMapKey=name | ||
| // +kubebuilder:validation:MaxItems=16 |
There was a problem hiding this comment.
Document the constraints enforced by these markers explicitly in the GoDoc please.
| // name is the Kubernetes resource name (e.g., "cpu", "memory", "nvidia.com/gpu"). | ||
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=253 | ||
| // +required |
There was a problem hiding this comment.
If this is a Kubernetes resource name, we should probably be matching the validation of https://github.com/kubernetes/kubernetes/blob/0275382260bc0d5279b703d21ec7ad4db35b8d78/staging/src/k8s.io/apimachinery/pkg/api/validate/strfmt.go#L278-L285
IIRC this is the same validation that is performed on Pod resource request names.
There was a problem hiding this comment.
This field would correspond to this one https://github.com/kubernetes-sigs/kueue/blob/e278cc9285c8caae4931430031547215c6b1252a/vendor/k8s.io/api/core/v1/types.go#L6934
I also see this on the docs https://github.com/kubernetes-sigs/kueue/blob/e278cc9285c8caae4931430031547215c6b1252a/vendor/k8s.io/api/core/v1/types.go#L6936-L6940
so I wonder which one to follow, the one you mentioned or the this one I'm linking?
| // weight is a string representation of a non-negative float64 value | ||
| // used as a weight for a resource in fair sharing calculations (e.g., "1.0", "0.5"). | ||
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=32 |
There was a problem hiding this comment.
32 feels a bit high to me - would setting something like weight: "0.00000000000000000000000000001" be reasonable here?
There was a problem hiding this comment.
I'm not certain on this. I need to investigate more.
This commit adds the admission fair sharing configuration options to the Kueue Spec to allow users to configure it.
85fc45a to
bd559f9
Compare
|
@MaysaMacedo: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR adds the support for the
AdmissionFairSharingconfigs in the kueue-operator API Spec.The type of ResourceWeights was modified from
map[corev1.ResourceName]float64tomap[corev1.ResourceName]stringto address the following warning:Internally that type is converted back to
float64to be accepted by the Kueue API.Special notes for your reviewer:
Assisted with Claude