Skip to content

fix: replace kube-rbac-proxy with built-in metrics authn/authz#382

Merged
dejanzele merged 3 commits into
mainfrom
fix/kube-rbac-proxy-builtin-metrics
Jun 25, 2026
Merged

fix: replace kube-rbac-proxy with built-in metrics authn/authz#382
dejanzele merged 3 commits into
mainfrom
fix/kube-rbac-proxy-builtin-metrics

Conversation

@dejanzele

@dejanzele dejanzele commented Jun 24, 2026

Copy link
Copy Markdown
Member

Problem

The operator's metrics endpoint was protected by a kube-rbac-proxy sidecar that pulled docker.io/kubebuilder/kube-rbac-proxy:v0.16.0 — a deprecated, amd64-only Docker Hub mirror. It fails to pull on non-amd64 nodes, and the upstream gcr.io/docker.io kubebuilder mirrors are being retired. kubebuilder itself discontinued kube-rbac-proxy as of scaffold v4.1.0, replacing it with controller-runtime's built-in metrics authentication/authorization.

Change

Migrate to the modern controller-runtime pattern and remove the sidecar entirely:

  • cmd/main.go — serve metrics over HTTPS on :8443 with SecureServing and FilterProvider: filters.WithAuthenticationAndAuthorization; add --metrics-secure/--enable-http2 flags and disable HTTP/2 by default.
  • Helm chart — remove the kube-rbac-proxy container and its image/values; expose the metrics port on the manager container; rename proxy-rbac.yamlmetrics-auth-rbac.yaml.
  • kustomize (config/) — drop manager_auth_proxy_patch.yaml; add manager_metrics_patch.yaml + metrics_service.yaml; replace the auth_proxy_* RBAC with metrics_auth/metrics_reader roles (same TokenReview/SubjectAccessReview permissions, now granted to the manager ServiceAccount).
  • go.mod/go.sum — add k8s.io/apiserver (the authn/authz that previously ran in the sidecar now runs in-process).

This removes the dependency on the broken external image entirely.

Deployment note

Metrics move from plain HTTP :8080 (proxied) to authenticated HTTPS on :8443. Prometheus scrapers now need a bearer token and insecure_skip_verify (or a cert); the metrics-reader ClusterRole grants /metrics access — same security model as before, just in-process.

The kube-rbac-proxy sidecar pulled docker.io/kubebuilder/kube-rbac-proxy:v0.16.0,
a deprecated, amd64-only Docker Hub mirror that fails to pull on other
architectures. kubebuilder discontinued kube-rbac-proxy as of v4.1.0 in favour
of controller-runtime's built-in metrics authn/authz.

Migrate to the modern pattern:
- Serve metrics over HTTPS on :8443 with SecureServing and the
  WithAuthenticationAndAuthorization filter provider in cmd/main.go.
- Remove the sidecar container and its image from the Helm chart and kustomize
  config; expose the metrics port on the manager container instead.
- Replace the auth_proxy_* RBAC with metrics_auth and metrics_reader roles
  (same TokenReview/SubjectAccessReview permissions, now used by the manager SA).

This removes the dependency on the broken external image entirely.

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
@datadog-armadaproject

Copy link
Copy Markdown

Pipelines

⚠️ Warnings

🚦 1 Pipeline job failed

CI | test / Generated Helm Chart Up To Date   View in Datadog   GitHub Actions

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: eb66ca6 | Docs | Give us feedback!

Run 'make generate-helm-chart' so the committed chart matches the helmify
output (drops the manually-added component labels and redundant
--metrics-secure arg), fixing the 'Generated Helm Chart Up To Date' CI check.

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the operator’s metrics endpoint protection from a kube-rbac-proxy sidecar to controller-runtime’s built-in metrics authn/authz, eliminating the deprecated/amd64-only proxy image dependency.

Changes:

  • Switch metrics serving to controller-runtime SecureServing with filters.WithAuthenticationAndAuthorization and add flags to control TLS and HTTP/2 behavior.
  • Update kustomize manifests to remove the proxy sidecar/patches and introduce in-process metrics service + RBAC.
  • Update Helm chart to remove the sidecar/image values and expose metrics directly from the manager container.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
go.mod Adds k8s.io/apiserver and related indirect deps required for in-process authn/authz.
go.sum Records checksums for newly introduced dependencies.
cmd/main.go Configures secure metrics serving + authn/authz filter; adds --metrics-secure and --enable-http2.
config/default/kustomization.yaml Enables the metrics Service and applies the new metrics args patch; removes auth-proxy patch usage.
config/default/metrics_service.yaml Updates metrics Service labels and targets manager port 8443 directly.
config/default/manager_metrics_patch.yaml Adds a deployment patch to set --metrics-bind-address=:8443.
config/default/manager_auth_proxy_patch.yaml Removes the kube-rbac-proxy sidecar injection patch.
config/rbac/kustomization.yaml Replaces auth-proxy RBAC resources with metrics authn/authz + reader RBAC resources.
config/rbac/metrics_auth_role.yaml Adds ClusterRole for TokenReview/SubjectAccessReview needed by in-process authn/authz.
config/rbac/metrics_auth_role_binding.yaml Binds the metrics-auth ClusterRole to the controller manager ServiceAccount.
config/rbac/metrics_reader_role.yaml Adds ClusterRole allowing GET on non-resource URL /metrics.
config/rbac/auth_proxy_role.yaml Removes old kube-rbac-proxy TokenReview/SAR ClusterRole.
config/rbac/auth_proxy_role_binding.yaml Removes old kube-rbac-proxy ClusterRoleBinding.
config/rbac/auth_proxy_client_clusterrole.yaml Removes old kube-rbac-proxy metrics-reader ClusterRole.
charts/armada-operator/values.yaml Removes kube-rbac-proxy values and updates manager args to bind metrics on :8443.
charts/armada-operator/templates/deployment.yaml Removes kube-rbac-proxy container and associated affinity.
charts/armada-operator/templates/metrics-service.yaml Updates metrics Service component label to metrics.
charts/armada-operator/templates/metrics-reader-rbac.yaml Renames/adjusts metrics reader RBAC and removes proxy-specific labels.
charts/armada-operator/templates/metrics-auth-rbac.yaml Renames proxy RBAC to metrics-auth RBAC and removes proxy-specific labels.
Comments suppressed due to low confidence (2)

charts/armada-operator/templates/metrics-auth-rbac.yaml:6

  • Other RBAC templates in this chart consistently include app.kubernetes.io/component, app.kubernetes.io/created-by, and app.kubernetes.io/part-of labels (e.g. templates/manager-rbac.yaml). These were removed here, which makes labels inconsistent across resources and can make selecting/grouping them harder.
    charts/armada-operator/templates/metrics-auth-rbac.yaml:26
  • Same label-consistency issue as above for the ClusterRoleBinding in this file: the chart generally includes component/created-by/part-of labels on RBAC resources, but this binding now only has the shared Helm labels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/main.go
Comment thread config/default/manager_metrics_patch.yaml Outdated
Comment thread charts/armada-operator/templates/metrics-reader-rbac.yaml
- Document that --metrics-secure=false also disables the authn/authz filter
  (not just TLS), and log a startup warning when metrics are served insecurely.
- Correct the manager_metrics_patch.yaml comment: it only sets the metrics
  bind address; HTTPS/authz is governed by the --metrics-secure flag.

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
@dejanzele

Copy link
Copy Markdown
Member Author

Fixed the metrics-secure wording (now calls out that =false also drops authn/authz, plus a startup warning) and the patch comment in 93db33e.

Skipped the RBAC label comments though. The chart labels are already mixed: leader-election-rbac has component/created-by/part-of, but serviceaccount and manager-role don't, and these new files match serviceaccount.yaml + the kubebuilder v4.1 scaffold. Adding them by hand would just get stripped on the next make generate-helm-chart anyway.

@richscott richscott left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 A nice improvement - I look forward to trying this on my Macbook!

@mergify

mergify Bot commented Jun 24, 2026

Copy link
Copy Markdown

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

  • Queue this pull request

@dejanzele dejanzele merged commit e50a903 into main Jun 25, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants