Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a4ee752
Remove v3io from MLRUN_MODEL_ENDPOINT_MONITORING__STORE_PREFIXES__USE…
royischoss Feb 24, 2026
ce7601e
fix formating
royischoss Feb 25, 2026
061e2a4
fix version and remove deprecated dsn
royischoss Feb 25, 2026
0b2243f
adding otel to ce
royischoss Mar 25, 2026
08c0be1
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Mar 25, 2026
40baa80
fix requirements.lock
royischoss Mar 25, 2026
ff914ac
fixes
royischoss Mar 26, 2026
f121089
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Mar 29, 2026
78b175a
works
royischoss Mar 30, 2026
73ba287
Otel with collector works well
royischoss Apr 5, 2026
d71d893
bump chart version
royischoss Apr 5, 2026
6889bb2
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Apr 5, 2026
8b0be84
bump chart version
royischoss Apr 5, 2026
aedbbcf
fix lint
royischoss Apr 5, 2026
3cc4d46
documentation fixes
royischoss Apr 9, 2026
f7cca6a
fixes
royischoss Apr 9, 2026
6f6193c
fixes
royischoss Apr 9, 2026
4f56e6d
change method to push to prometheus
royischoss Apr 12, 2026
3aa7420
change method to push to prometheus
royischoss Apr 14, 2026
36c4b3f
remove labeling s3 and TimescaleDB fix jupyter bug. update documentat…
royischoss Apr 15, 2026
a5e71ef
another jupyter timing fix
royischoss Apr 15, 2026
b15ee61
remove redundant loop for crds check
royischoss Apr 15, 2026
a76e23e
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Apr 15, 2026
4f2bda3
fix requirements.lock
royischoss Apr 15, 2026
657cab3
fix rc version
royischoss Apr 15, 2026
73c6aa1
fix pin kubectl version in jobs, fix documentation for crds readiness…
royischoss Apr 15, 2026
28cf7ba
fix pin kubectl version in jobs, fix documentation for crds readiness…
royischoss Apr 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ jobs:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add strimzi https://strimzi.io/charts/
helm repo add seaweedfs https://seaweedfs.github.io/seaweedfs/helm
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

- name: Run chart-releaser
uses: helm/chart-releaser-action@cae68fefc6b5f367a0275617c9f83181ba54714f
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@ charts/mlrun-ce/charts/*
**/.DS_Store
*.DS_Store
**/__pycache__
# Packaged chart tarballs (generated by make package)
charts/mlrun-ce/mlrun-ce-*.tgz
# MLRun project directories created by test scripts
otlp-pro/
2 changes: 1 addition & 1 deletion charts/mlrun-ce/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v1
name: mlrun-ce
version: 0.11.0-rc.31
version: 0.11.0-rc.32
description: MLRun Open Source Stack
home: https://iguazio.com
icon: https://www.iguazio.com/wp-content/uploads/2019/10/Iguazio-Logo.png
Expand Down
53 changes: 50 additions & 3 deletions charts/mlrun-ce/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The Open source MLRun ce chart includes the following stack:
* Spark Operator - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
* Pipelines - https://github.com/kubeflow/pipelines
* Prometheus stack - https://github.com/prometheus-community/helm-charts
* OpenTelemetry Operator - https://github.com/open-telemetry/opentelemetry-operator (observability)

## Prerequisites

Expand Down Expand Up @@ -64,6 +65,33 @@ helm --namespace mlrun \
mlrun/mlrun-ce
```

### Installing with OpenTelemetry Enabled

> **Note:** OpenTelemetry is **disabled by default**. Follow the standard [Installing the Chart](#installing-the-chart) steps, adding the OTel flags below.

To install with OpenTelemetry enabled, append the following flags to the helm install command:

```bash
helm --namespace mlrun \
install my-mlrun \
--wait \
--set global.registry.url=<registry URL e.g. index.docker.io/iguazio> \
--set global.registry.secretName=registry-credentials \
--set opentelemetry-operator.enabled=true \
--set opentelemetry.namespaceLabel.enabled=true \
--set opentelemetry.collector.enabled=true \
--set opentelemetry.instrumentation.enabled=true \
mlrun/mlrun-ce
```

To verify the OpenTelemetry resources were created:

```bash
kubectl -n mlrun get opentelemetrycollectors
kubectl -n mlrun get instrumentations
kubectl -n mlrun get pods | grep opentelemetry
```

### Installing MLRun-ce on minikube

The Open source MLRun ce uses node ports for simplicity. If your kubernetes cluster is running inside a VM,
Expand All @@ -89,6 +117,25 @@ following values:
Additional configurable values are documented in the `values.yaml`, and the `values.yaml` of all sub charts.
Override those [in the normal methods](https://helm.sh/docs/chart_template_guide/values_files/).

### Configuring OpenTelemetry (Observability)

MLRun CE includes the OpenTelemetry Operator for collecting metrics and traces. When enabled, it deploys a single collector per namespace (deployment mode) — instrumented pods push OTLP data to the collector, which forwards metrics to Prometheus via the OTLP endpoint. Python auto-instrumentation is applied namespace-wide via a webhook, and the `mlrun.io/otel: "true"` label is applied to Jupyter and Nuclio function pods to mark them for metric enrichment and trigger OTel injection on restart.

For a fresh install with OTel, see [Installing with OpenTelemetry Enabled](#installing-with-opentelemetry-enabled).

To enable OTel on an existing installation:

```bash
helm --namespace mlrun upgrade my-mlrun \
--set opentelemetry-operator.enabled=true \
--set opentelemetry.namespaceLabel.enabled=true \
--set opentelemetry.collector.enabled=true \
--set opentelemetry.instrumentation.enabled=true \
mlrun/mlrun-ce
```

> **Note:** The above assumes a single-namespace installation. For multi-namespace (admin/non-admin) deployments, refer to the MLRun documentation.

### Working with ECR

To work with ECR, you must create a secret with your AWS credentials and a secret with ECR Token while providing both secret names to the helm install command.
Expand Down Expand Up @@ -282,6 +329,6 @@ Refer to the [**Kubeflow documentation**](https://www.kubeflow.org/docs/started/

This table shows the versions of the main components in the MLRun CE chart:

| MLRun CE | MLRun | Nuclio | Jupyter | MPI Operator | SeaweedFS | Spark Operator | Pipelines | Kube-Prometheus-Stack |
|------------|--------|--------|---------|--------------|-----------|----------------|-----------|-----------------------|
| **0.11.0** | 1.11.0 | 1.15.9 | 4.5.0 | 0.2.3 | 4.17.0 | 2.1.0 | 2.15.0 | 72.1.1 |
| MLRun CE | MLRun | Nuclio | Jupyter | MPI Operator | SeaweedFS | Spark Operator | Pipelines | Kube-Prometheus-Stack | OpenTelemetry Operator |
|------------|--------|--------|---------|--------------|-----------|----------------|-----------|-----------------------|------------------------|
| **0.11.0** | 1.11.0 | 1.15.9 | 4.5.0 | 0.2.3 | 4.17.0 | 2.1.0 | 2.15.0 | 72.1.1 | 0.78.1 |
25 changes: 25 additions & 0 deletions charts/mlrun-ce/admin_installation_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,28 @@ strimzi-kafka-operator:

kafka:
enabled: false

# OpenTelemetry Operator - enabled for CRD installation at cluster level
opentelemetry-operator:
enabled: true
admissionWebhooks:
certManager:
enabled: false
autoGenerateCert:
enabled: true
# Only apply webhooks to namespaces with the opentelemetry label
namespaceSelector:
matchLabels:
opentelemetry.io/inject: "enabled"

# OpenTelemetry CRs - disabled at admin level, enabled in user namespaces
# Note: Controller namespace does NOT need the opentelemetry label since
# no workloads are instrumented here - only the operator runs here
opentelemetry:
namespaceLabel:
enabled: false
collector:
enabled: false
instrumentation:
enabled: false

16 changes: 16 additions & 0 deletions charts/mlrun-ce/non_admin_cluster_ip_installation_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,19 @@ kafka:

kube-prometheus-stack:
enabled: false

# OpenTelemetry Operator - disabled, CRDs installed at controller level
opentelemetry-operator:
enabled: false

# OpenTelemetry CRs - enabled for user namespace
# The namespace will be labeled with opentelemetry.io/inject=enabled
# so the operator can inject sidecars into pods
opentelemetry:
namespaceLabel:
enabled: true
collector:
enabled: true
instrumentation:
enabled: true

16 changes: 16 additions & 0 deletions charts/mlrun-ce/non_admin_installation_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,19 @@ kafka:

kube-prometheus-stack:
enabled: false

# OpenTelemetry Operator - disabled, CRDs installed at controller level
opentelemetry-operator:
enabled: false

# OpenTelemetry CRs - enabled for user namespace
# The namespace will be labeled and annotated for OTel deployment-mode collection
# and namespace-wide Python auto-instrumentation.
opentelemetry:
namespaceLabel:
enabled: true
collector:
enabled: true
instrumentation:
enabled: true

7 changes: 5 additions & 2 deletions charts/mlrun-ce/requirements.lock
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,8 @@ dependencies:
- name: strimzi-kafka-operator
repository: https://strimzi.io/charts/
version: 0.48.0
digest: sha256:e2b2d1b7531c4829aa25c8ce8d95506642ab59d0cb692a343d2e508a71525374
generated: "2026-03-31T17:13:31.403112322Z"
- name: opentelemetry-operator
repository: https://open-telemetry.github.io/opentelemetry-helm-charts
version: 0.78.1
digest: sha256:50ed77fd11e450e243c05eadac99857b4b0aae92ae73ca9a6c00fc1cdc726f70
generated: "2026-04-15T11:23:19.249332+03:00"
4 changes: 4 additions & 0 deletions charts/mlrun-ce/requirements.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ dependencies:
repository: "https://strimzi.io/charts/"
version: "0.48.0"
condition: strimzi-kafka-operator.enabled
- name: opentelemetry-operator
repository: "https://open-telemetry.github.io/opentelemetry-helm-charts"
version: "0.78.1"
condition: opentelemetry-operator.enabled
39 changes: 39 additions & 0 deletions charts/mlrun-ce/templates/NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,5 +127,44 @@ TimescaleDB is available at:
{{- end }}
{{- end }}

{{- if index .Values "opentelemetry-operator" "enabled" }}
{{- "\n" }}
OpenTelemetry Operator is enabled!
- Operator manages OpenTelemetryCollector and Instrumentation CRs
- Namespace selector: opentelemetry.io/inject=enabled
{{- if .Values.opentelemetry.collector.enabled }}
{{- "\n" }}
OpenTelemetry Collector (deployment mode):
- Collector CR: {{ include "mlrun-ce.otel.collector.fullname" . }}
- OTLP gRPC endpoint: {{ include "mlrun-ce.otel.collector.fullname" . }}-collector:{{ .Values.opentelemetry.collector.otlp.grpcPort }}
- OTLP HTTP endpoint: {{ include "mlrun-ce.otel.collector.fullname" . }}-collector:{{ .Values.opentelemetry.collector.otlp.httpPort }}
- Metrics export: collector pushes via OTLP to Prometheus at /api/v1/otlp/v1/metrics
{{- end }}
{{- if .Values.opentelemetry.instrumentation.enabled }}
{{- "\n" }}
OpenTelemetry Auto-Instrumentation:
- Instrumentation CR: {{ include "mlrun-ce.otel.instrumentation.fullname" . }}
{{- if .Values.opentelemetry.instrumentation.python.enabled }}
- Python auto-instrumentation: enabled (namespace-wide via namespace annotation)
{{- end }}
{{- if .Values.opentelemetry.instrumentation.java.enabled }}
- Java auto-instrumentation: enabled
{{- end }}
{{- end }}
{{- if .Values.opentelemetry.namespaceLabel.enabled }}
{{- "\n" }}
Namespace OTel configuration:
- Label: {{ .Values.opentelemetry.namespaceLabel.key }}={{ .Values.opentelemetry.namespaceLabel.value }}
{{- if .Values.opentelemetry.instrumentation.enabled }}
- Python instrumentation annotation applied to all pods in namespace {{ .Release.Namespace }}
{{- end }}
{{- end }}
{{- if or .Values.opentelemetry.collector.enabled .Values.opentelemetry.instrumentation.enabled }}
{{- "\n" }}
Pods labeled with mlrun.io/otel=true: Jupyter and Nuclio function pods (via functionDefaults).
These Python-based pods receive OTel auto-instrumentation (runtime metrics, traces, HTTP metrics for Nuclio functions).
{{- end }}
{{- end }}

Happy MLOPSing!!! :]
{{- end }}
Loading
Loading