Skip to content

Set cluster Ready status only when all pods are ready#583

Open
andmat900 wants to merge 4 commits intoeiffel-community:mainfrom
andmat900:20260331_cluster_ready_status_check
Open

Set cluster Ready status only when all pods are ready#583
andmat900 wants to merge 4 commits intoeiffel-community:mainfrom
andmat900:20260331_cluster_ready_status_check

Conversation

@andmat900
Copy link
Copy Markdown
Contributor

Applicable Issues

Fixes #406

Description of the Change

Add a readiness check after resource reconciliation that verifies all owned Deployments and StatefulSets have their desired ready replicas. A new Reconciling condition is set while pods are starting up, and Ready=True is only set once all pods are confirmed ready.

Alternate Designs

Considered label-based filtering instead of owner references, but owner references are already set by the controller and more reliable.

Possible Drawbacks

Sign-off

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

Signed-off-by: Andrei Matveyeu, andrei.matveyeu@axis.com

@andmat900 andmat900 requested a review from a team as a code owner April 1, 2026 10:25
@andmat900 andmat900 requested review from fredjn and t-persson April 1, 2026 10:25
const (
StatusAvailable = "Available"
StatusReady = "Ready"
StatusReconciling = "Reconciling"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a new status for this? Can't we just set Ready=False with Reason=ReasonPending?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was suggested in the original issue: #406

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with myself from 1 year ago. I also do believe that it was wrong of me to give a suggested solution

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@andmat900 andmat900 force-pushed the 20260331_cluster_ready_status_check branch from 000d5b1 to 516a652 Compare April 1, 2026 12:27
Change-Id: I786ce44952436ee245b71403bf20c5b13d690c95
Signed-off-by: Andrei Matveyeu <andreimu@axis.com>
@andmat900 andmat900 force-pushed the 20260331_cluster_ready_status_check branch from 516a652 to 337446d Compare April 1, 2026 12:30
Change-Id: I1a89fb69f012d6bfc8099787016f79df449d27ee
Signed-off-by: Andrei Matveyeu <andrei.matveyeu@axis.com>
@andmat900 andmat900 requested a review from t-persson April 2, 2026 09:24
return err
}
return nil
return readiness.CheckDeployment(ctx, r.Client, namespacedName)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just do these checks in, for example, reconcileDeployment? We already get the deployment there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Change-Id: I1024b7aec42f3541bd2330955408b0159457b2d4
Signed-off-by: Andrei Matveyeu <andrei.matveyeu@axis.com>
Change-Id: I0f9abf8984b6d9f77445b82da3eb539466da1ea1
Signed-off-by: Andrei Matveyeu <andreimu@axis.com>
@andmat900 andmat900 requested a review from t-persson April 2, 2026 11:36
Comment on lines +81 to +85
if !readiness.IsNotReadyError(err) {
logger.Error(err, "Failed to reconcile the deployment for the ETOS API")
return err
}
notReadyErr = err
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just return the error instead of moving all the way down the function and then return the error.

Is this because the deployment requires the other resources deployed, then we can just move the deployment reconciliation after those.

return target, err
}
return target, nil
return target, readiness.DeploymentReady(target)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an explicit NotReady return, since we have just created the deployment.

return target, err
}
return target, r.Patch(ctx, target, client.StrategicMergeFrom(deployment))
return target, readiness.DeploymentReady(target)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an explicit NotReady return, since we have just patched the deployment.


// DeploymentReady checks whether a Deployment has its desired number of ready replicas.
// Returns nil if ready, or a NotReadyError if not yet ready.
func DeploymentReady(dep *appsv1.Deployment) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of dealing with counting replicas we should just use this condition and reason:

type: Progressing
status: "True"
reason: NewReplicaSetAvailable

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#complete-deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cluster status ready should only be set to True if the cluster is ready to use

2 participants