Avoid using lb-ext.kubeconfig for seedgen cleanup#5940
Conversation
|
/lgtm |
|
/approve |
|
/retest |
# Background During cleanup before seed generation, we want to delete all pods in the Succeeded or Failed phase. # Issue The current implementation uses `oc delete pod --all-namespaces --field-selector=status.phase==...` with the node `lb-ext.kubeconfig` as a kubeconfig. This is in contrast to the rest of the controller, which uses the in-cluster config via a client loaded with ctrl.GetConfigOrDie(). We've had users report TLS errors when the controller tries to use the lb-ext.kubeconfig, which seems expected since that kubeconfig is meant for external use and may not have the correct CA or credentials for in-cluster API access. Before they hit that error, the client has already been used, so it proves that the in-cluster config works, and this is specifically an issue with using the lb-ext.kubeconfig for this cleanup step. # Solution The solution is to avoid using the lb-ext.kubeconfig for this cleanup step, and instead use the same in-cluster client that the rest of the controller uses. # Caveat As the existing TODO states: ```go // TODO: Can this be done cleanly via client? The client.DeleteAllOf seems to require a specified namespace, so maybe loop over the namespaces ``` The Kubernetes API does not support DeleteCollection without a namespace for namespaced resources, so we cannot simply use DeleteAllOf with no namespace. Instead, we copy the behavior of kubectl when using `delete --all-namespaces`, which is to list the resources with the field selector, and then delete them one by one. This is less efficient than a single DeleteCollection call, but it's already what kubectl was doing when we were calling it, so we know it works and is acceptable for this use case. # Implementation Details We cannot use `client.MatchingFields` because we do not have an index set up and it seems overkill to do so. # Agent Interaction Agent assisted. The conversation went something like this: - Where does r.Client get its kubeconfig? → ctrl.GetConfigOrDie() → in-cluster SA - Where does the oc call get its kubeconfig? → lb-ext.kubeconfig on the host via chroot executor - Can we generate a kubeconfig from rest.Config? → no built-in utility, hand-rolled versions kept being incomplete - Just use DeleteAllOf with no namespace? → tried it, in a toy project using `sigs.k8s.io/controller-runtime/pkg/envtest`, API rejects it (no cluster-scoped DeleteCollection for namespaced resources) - Inspect kubectl source - what does it do when delete --all-namespaces? → list + delete one by one, sequentially - Did that, modified toy project to prove it works, updated the controller, added comment explaining the reasoning
|
/override ci/prow/ipc-e2e-flow |
|
@jc-rh: Overrode contexts on behalf of jc-rh: ci/prow/ipc-e2e-flow DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/lgtm |
|
/override ci/prow/ipc-e2e-flow |
|
@jc-rh: Overrode contexts on behalf of jc-rh: ci/prow/ipc-e2e-flow DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: javipolo, jc-rh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherrypick release-4.21 release-4.20 |
|
@jc-rh: new pull request created: #6003 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
See commit message