Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions deploy/helm/spur-cloud/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Patterns to ignore when building Helm packages.
.DS_Store
.git/
.gitignore
.bzr/
.hg/
.svn/
*.swp
*.bak
*.tmp
*.orig
*~
.project
.idea/
*.tmproj
.vscode/
18 changes: 18 additions & 0 deletions deploy/helm/spur-cloud/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: v2
name: spur-cloud
description: GPU as a Service platform built on Spur. Deploys the API, frontend, optional in-cluster Postgres, RBAC, and ingress.
type: application
version: 0.1.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version: 0.1.0
version: 0.0.1-dev

appVersion: "0.1.0"
kubeVersion: ">=1.24.0-0"
home: https://github.com/ROCm/spur-cloud
sources:
- https://github.com/ROCm/spur-cloud
maintainers:
- name: ROCm
url: https://github.com/ROCm
keywords:
- gpu
- hpc
- spur
- scheduler
97 changes: 97 additions & 0 deletions deploy/helm/spur-cloud/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# spur-cloud Helm chart

Deploys the Spur Cloud control plane (API + frontend), optional in-cluster
Postgres, RBAC, and ingress. Does **not** deploy `spurctld` or the
`spur-k8s` operator — those live in [ROCm/spur](https://github.com/ROCm/spur).

## TL;DR

```bash
# 1. Generate a JWT signing key + DB password
JWT=$(openssl rand -hex 32)
DBPW=$(openssl rand -hex 16)

# 2. Install
helm install spur-cloud ./deploy/helm/spur-cloud \
--namespace spur-cloud --create-namespace \
--set secrets.jwtSecret="$JWT" \
--set secrets.dbPassword="$DBPW" \
--set ingress.host=gpu.example.com \
--set config.publicUrl=https://gpu.example.com
```

## What gets installed

| Resource | Default | Toggle |
|----------|---------|--------|
| `Deployment` spur-cloud-api (2 replicas) | on | `api.enabled` |
| `Deployment` spur-cloud-frontend (2 replicas) | on | `frontend.enabled` |
| `Ingress` (host + `/` → frontend, `/api` → api) | on | `ingress.enabled` |
| `Secret` (spur-cloud.toml + db-password) | on | `secrets.create` |
| `ServiceAccount` + `ClusterRole`/`Binding` | on | `serviceAccount.create`, `rbac.create` |
| `StatefulSet` postgres (1 replica) | on | `postgres.enabled` |
| `Namespace` for session pods | on | `createSessionNamespace` |

## Required secrets

The chart fails to render unless these are set (or you provide an
`existingSecret` and turn `secrets.create=false`):

- `secrets.jwtSecret` — JWT signing key (`openssl rand -hex 32`)
- `secrets.dbPassword` — when `postgres.enabled=true`
- `secrets.githubClientSecret` — when `config.auth.github.enabled=true`
- `secrets.oktaClientSecret` — when `config.auth.okta.enabled=true`

## External Postgres

```yaml
postgres:
enabled: false
database:
url: "postgresql://user:pass@rds.example.com:5432/spur_cloud"
```

When using ExternalSecrets / sealed-secrets:

```yaml
secrets:
create: false
existingSecret: my-existing-secret
```

The existing secret must contain key `spur-cloud.toml` (full rendered
config) and, if using in-cluster Postgres, key `db-password`.

## Image references

Defaults point at `ghcr.io/rocm/spur-cloud-{api,frontend}`. Override:

```yaml
api:
image:
repository: my-registry.example.com/spur-cloud-api
tag: v0.2.0
frontend:
image:
repository: my-registry.example.com/spur-cloud-frontend
tag: v0.2.0
image:
pullSecrets:
- name: my-registry-creds
```

## Verify

```bash
helm lint ./deploy/helm/spur-cloud \
--set secrets.jwtSecret=test --set secrets.dbPassword=test
helm template spur-cloud ./deploy/helm/spur-cloud \
--set secrets.jwtSecret=test --set secrets.dbPassword=test
```

## Limitations / TODO

- No HPA, NetworkPolicy, PodDisruptionBudget yet.
- Postgres is single-replica with no backup. Production should use managed
Postgres or a real operator (CNPG, Zalando).
- No HA story for spurctld here — see the spur chart.
31 changes: 31 additions & 0 deletions deploy/helm/spur-cloud/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Spur Cloud is installed as release "{{ .Release.Name }}" in namespace "{{ .Release.Namespace }}".

1. Reach the UI:
{{- if .Values.ingress.enabled }}
https://{{ .Values.ingress.host }}
Make sure DNS for {{ .Values.ingress.host }} points at your ingress controller{{- if .Values.ingress.tls.enabled }} and that TLS secret "{{ .Values.ingress.tls.secretName }}" exists in this namespace{{- end }}.
{{- else }}
Ingress is disabled. Port-forward instead:
kubectl -n {{ .Release.Namespace }} port-forward svc/{{ include "spur-cloud.frontend.fullname" . }} 8080:{{ .Values.frontend.service.port }}
{{- end }}

2. Check pods:
kubectl -n {{ .Release.Namespace }} get pods -l app.kubernetes.io/instance={{ .Release.Name }}

3. Session pods land in namespace "{{ .Values.sessionNamespace }}" (created by this chart: {{ .Values.createSessionNamespace }}).

{{- if not .Values.postgres.enabled }}

NOTE: in-cluster Postgres is disabled. The API is configured to use:
{{ .Values.database.url }}
{{- end }}

{{- if .Values.secrets.existingSecret }}

NOTE: using existing secret "{{ .Values.secrets.existingSecret }}" — it must contain key "spur-cloud.toml"{{- if .Values.postgres.enabled }} and key "db-password"{{- end }}.
{{- end }}

Prerequisites (NOT installed by this chart):
- Spur controller (spurctld) reachable at: {{ .Values.config.spur.controllerAddr }}
- spur-k8s operator watching the {{ .Values.sessionNamespace }} namespace
- GPU nodes labeled spur.ai/managed=true and spur.ai/gpu-type=<type>
89 changes: 89 additions & 0 deletions deploy/helm/spur-cloud/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "spur-cloud.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}

{{/*
Fully qualified app name.
*/}}
{{- define "spur-cloud.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}

{{- define "spur-cloud.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}

{{- define "spur-cloud.labels" -}}
helm.sh/chart: {{ include "spur-cloud.chart" . }}
app.kubernetes.io/name: {{ include "spur-cloud.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end -}}

{{- define "spur-cloud.api.fullname" -}}
{{ include "spur-cloud.fullname" . }}-api
{{- end -}}

{{- define "spur-cloud.frontend.fullname" -}}
{{ include "spur-cloud.fullname" . }}-frontend
{{- end -}}

{{- define "spur-cloud.postgres.fullname" -}}
{{ include "spur-cloud.fullname" . }}-postgres
{{- end -}}

{{- define "spur-cloud.serviceAccountName" -}}
{{- if .Values.serviceAccount.create -}}
{{- default (include "spur-cloud.fullname" .) .Values.serviceAccount.name -}}
{{- else -}}
{{- default "default" .Values.serviceAccount.name -}}
{{- end -}}
{{- end -}}

{{- define "spur-cloud.secretName" -}}
{{- if .Values.secrets.existingSecret -}}
{{- .Values.secrets.existingSecret -}}
{{- else -}}
{{ include "spur-cloud.fullname" . }}-secrets
{{- end -}}
{{- end -}}
Comment on lines +56 to +62

{{- define "spur-cloud.api.image" -}}
{{- $tag := default .Chart.AppVersion .Values.api.image.tag -}}
{{- printf "%s:%s" .Values.api.image.repository $tag -}}
{{- end -}}

{{- define "spur-cloud.frontend.image" -}}
{{- $tag := default .Chart.AppVersion .Values.frontend.image.tag -}}
{{- printf "%s:%s" .Values.frontend.image.repository $tag -}}
{{- end -}}

{{/*
Resolve the database URL: explicit override wins, otherwise build from
in-cluster Postgres service if enabled.
*/}}
{{- define "spur-cloud.databaseUrl" -}}
{{- if .Values.database.url -}}
{{- .Values.database.url -}}
{{- else if .Values.postgres.enabled -}}
{{- if not .Values.secrets.dbPassword -}}
{{- fail "secrets.dbPassword must be set when postgres.enabled is true and secrets.existingSecret is unused" -}}
{{- end -}}
{{- printf "postgresql://%s:%s@%s:5432/%s" .Values.postgres.user .Values.secrets.dbPassword (include "spur-cloud.postgres.fullname" .) .Values.postgres.database -}}
{{- else -}}
{{- fail "Either postgres.enabled must be true or database.url must be set" -}}
{{- end -}}
{{- end -}}
101 changes: 101 additions & 0 deletions deploy/helm/spur-cloud/templates/api.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
{{- if .Values.api.enabled -}}
apiVersion: v1
kind: Service
metadata:
name: {{ include "spur-cloud.api.fullname" . }}
labels:
{{- include "spur-cloud.labels" . | nindent 4 }}
app.kubernetes.io/component: api
spec:
type: {{ .Values.api.service.type }}
selector:
app.kubernetes.io/name: {{ include "spur-cloud.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: api
ports:
- name: http
port: {{ .Values.api.service.port }}
targetPort: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "spur-cloud.api.fullname" . }}
labels:
{{- include "spur-cloud.labels" . | nindent 4 }}
app.kubernetes.io/component: api
spec:
replicas: {{ .Values.api.replicas }}
selector:
matchLabels:
app.kubernetes.io/name: {{ include "spur-cloud.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: api
template:
metadata:
labels:
{{- include "spur-cloud.labels" . | nindent 8 }}
app.kubernetes.io/component: api
annotations:
checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
{{- with .Values.api.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ include "spur-cloud.serviceAccountName" . }}
{{- with .Values.image.pullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: api
image: {{ include "spur-cloud.api.image" . }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
args:
- --config=/etc/spur-cloud/spur-cloud.toml
ports:
- name: http
containerPort: 8080
env:
{{- range $k, $v := .Values.api.env }}
- name: {{ $k }}
value: {{ $v | quote }}
{{- end }}
volumeMounts:
- name: config
mountPath: /etc/spur-cloud
readOnly: true
readinessProbe:
httpGet:
path: /readyz
port: http
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 20
resources:
{{- toYaml .Values.api.resources | nindent 12 }}
volumes:
- name: config
secret:
secretName: {{ include "spur-cloud.secretName" . }}
items:
- key: spur-cloud.toml
path: spur-cloud.toml
{{- with .Values.api.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.api.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.api.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
Loading
Loading