Skip to content

fix(helm): replace retina.io with retina.sh in hubble ClusterRole templates#2100

Merged
nddq merged 1 commit intomainfrom
copilot/fix-helm-agent-launch-issue
Mar 20, 2026
Merged

fix(helm): replace retina.io with retina.sh in hubble ClusterRole templates#2100
nddq merged 1 commit intomainfrom
copilot/fix-helm-agent-launch-issue

Conversation

Copy link
Contributor

Copilot AI commented Mar 5, 2026

Description

Hubble Helm ClusterRole templates still reference the old retina.io API group, while CRDs moved to retina.sh in PR #26. This causes the agent and operator to lack permissions for their own custom resources, crashing on launch.

Replaces all 8 occurrences of retina.io with retina.sh:

  • templates/agent/clusterrole.yaml (5 occurrences): retinaendpoints, metricsconfigurations, retinaendpoints/finalizers, retinaendpoints/status
  • templates/operator/clusterrole.yaml (3 occurrences): captures, captures/finalizers, captures/status

The standard (non-hubble) variant already uses the correct API group.

Related Issue

Hubble-deployed agent crashes because retina-agent service account cannot list metricsconfigurations.retina.sh or captures.retina.sh — the ClusterRoles grant access to the nonexistent retina.io group instead.

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Verified no retina.io references remain anywhere under deploy/. Diff is a pure string replacement with no structural changes.

Additional Notes

None.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

Original prompt

This section details on the original issue you should resolve

<issue_title>Helm-deployed agent crashing on launch: Helm-deployed ClusterRoles for operator, agent etc still use retina.io instead of retina.sh</issue_title>
<issue_description>Describe the bug

Service account in kube-system, retina-agent, seemingly has no permissions to list Retina's own CRs defined as being in retina.sh namespace.

EDIT Below, retina-agent-init is actually running the operator image instead of the init. My bad! The apigroups are, nonetheless, wrong.

Here's an excerpt from retina-agent-init container of the retina-agent-* pod (it never reaches the retina-agent container):

[EDIT] This is from the retina-operator image, by accident. The bug still applies.

E1118 23:39:21.428138       1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: Failed to watch *v1alpha1.MetricsConfiguration: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"metricsconfigurations\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1118 23:39:22.001828       1 reflector.go:569] pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1118 23:39:22.001924       1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1118 23:39:24.764710       1 reflector.go:569] pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "jobs" in API group "batch" at the cluster scope
E1118 23:39:24.765201       1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: Failed to watch *v1.Job: failed to list *v1.Job: jobs.batch is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope" logger="UnhandledError"
W1118 23:39:26.796528       1 reflector.go:569] pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "metricsconfigurations" in API group "retina.sh" at the cluster scope
E1118 23:39:26.796687       1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: Failed to watch *v1alpha1.MetricsConfiguration: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"metricsconfigurations\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1118 23:39:26.887540       1 reflector.go:569] pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1118 23:39:26.888200       1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.32.4/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"

This error is similar to #1122, but actually unrelated.

Examining the relevant ClusterRole object, retina-cluster-reader, I can see that in multiple places it seems to reference retina.io which is gone since pull request #26 which replaced retina.io with retina.sh in most places:

- apiGroups:
- retina.io
resources:
- retinaendpoints
verbs:
- get
- list
- watch

- apiGroups:
- retina.io
resources:
- retinaendpoints
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- retina.io
resources:
- metricsconfigurations
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- retina.io
resources:
- retinaendpoints/finalizers
verbs:
- update
- apiGroups:
- retina.io
resources:
- retinaendpoints/status
verbs:
- get
- patch
- update

This is also happening in the operator ClusterRole object, retina-operator-role:

https://github.com/microsoft/retina/blob/f2da04b8bbe1a3b232c2af2987b2f87051392a34/deploy/hubble/manifests/controller/helm/retina/templat...


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Fix Helm-deployed agent crashing on launch due to incorrect ClusterRoles Fix retina.io → retina.sh in hubble Helm ClusterRole templates Mar 5, 2026
@github-actions
Copy link

github-actions bot commented Mar 6, 2026

Retina Code Coverage Report

Total coverage no change

Increased diff

Impacted Files Coverage
pkg/controllers/daemon/namespace/namespace_controller.go 76.24% ... 78.46% (2.22%) ⬆️

@matmerr matmerr marked this pull request as ready for review March 18, 2026 22:34
@matmerr matmerr requested a review from a team as a code owner March 18, 2026 22:34
Replace all occurrences of the old `retina.io` API group with `retina.sh`
in the hubble variant's agent and operator ClusterRole templates to match
the actual CRD API group defined in groupversion_info.go.

Co-authored-by: Quang Nguyen <nguyenquang@microsoft.com>
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
@nddq nddq force-pushed the copilot/fix-helm-agent-launch-issue branch from 4dd76dd to d2336ad Compare March 20, 2026 19:44
@nddq nddq changed the title Fix retina.io → retina.sh in hubble Helm ClusterRole templates fix(helm): replace retina.io with retina.sh in hubble ClusterRole templates Mar 20, 2026
@nddq nddq enabled auto-merge March 20, 2026 19:50
@nddq nddq added this pull request to the merge queue Mar 20, 2026
Merged via the queue into main with commit 7a5852d Mar 20, 2026
29 of 30 checks passed
@nddq nddq deleted the copilot/fix-helm-agent-launch-issue branch March 20, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helm-deployed agent crashing on launch: Helm-deployed ClusterRoles for operator, agent etc still use retina.io instead of retina.sh

4 participants