Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
a87246a
init 1.0.0
jacobecox Apr 21, 2026
092f951
replicas successfully connecting
jacobecox Apr 21, 2026
c9a9e6c
init script now queries for sentinel for master, updated components a…
jacobecox Apr 22, 2026
7e7e0d1
fixed probes
jacobecox Apr 23, 2026
4e80a3d
added tagging names, cpln-common tags
jacobecox Apr 23, 2026
c99d21c
added post stop hook to handle shut down
jacobecox Apr 24, 2026
d702dd6
added multi location support
jacobecox Apr 24, 2026
96be166
improved ha proxy health check and patroni resiliancy with dcs (#245)
jacobecox Apr 24, 2026
115f099
lowered proxy rise to 1 (#246)
jacobecox Apr 25, 2026
94fd4e6
added secondary index value, added publishNotReadyAddresses tag, move…
jacobecox Apr 27, 2026
4317f64
lowered proxy rise to 1, added liveness probe, readiness probe and pr…
jacobecox Apr 27, 2026
a56e381
readded to template (#249)
jacobecox Apr 27, 2026
f03aa37
redis replica init now queries sentinel for master on startup (#250)
jacobecox May 1, 2026
1059473
updated readme and releases files (#251)
jacobecox May 1, 2026
d060eba
updated public access sentinel querying logic (#252)
jacobecox May 1, 2026
03860b7
added 1password connect provider (#253)
jacobecox May 5, 2026
cb2b7c7
added 1password connect (#254)
jacobecox May 5, 2026
7691b8f
redis v3.2.0
May 5, 2026
9360a71
updated template config
jacobecox Apr 29, 2026
c2184a1
updated values file
jacobecox Apr 29, 2026
944c13a
updated startup script
jacobecox Apr 29, 2026
ac868ae
updated script so all replicas are seeds
jacobecox Apr 29, 2026
7774307
reverted to single location
jacobecox Apr 30, 2026
426d7b4
updated node communication port, reverted to multi location setup
jacobecox May 6, 2026
b54c509
updated listen address and broadcast address
jacobecox May 6, 2026
df0651b
debezium + cdc pipeline templates
May 5, 2026
fd15c8e
redis: apply replication setting tweaks from 3.2.0 to 3.3.0.
May 5, 2026
b087a9d
redis v3.2.0: scalingPolicy: Parallel
May 6, 2026
486fc60
ESS version 1.3.5
May 6, 2026
5cd45ae
cdc-pipeline: add icon
May 6, 2026
982e270
redis: liveness probe must be a TCP port check only
May 6, 2026
f346eed
postgres-ha: multi-dc support.
May 5, 2026
02e39e5
kafka: v4.0.0
May 5, 2026
5b7540b
redis updates
May 7, 2026
d4dd4fe
cdc-pipeline: use kafka v4.0.0
May 7, 2026
75eef36
kafka: fix pipeline
May 7, 2026
335ea68
ESS: gcp sync all secrets
May 11, 2026
191fbef
shorten cdc desc
hakan-controlplane May 12, 2026
a5845d1
added pgbouncer and probes, fixed init bugs, added single location su…
jacobecox May 12, 2026
bbbceb2
added firewall for pgbouncer settings
jacobecox May 13, 2026
9e7c56a
added cockroach to its firewall list when pgbouncer is enabled (#256)
jacobecox May 13, 2026
2783615
updated headers patch to not touch other settings (#257)
jacobecox May 13, 2026
25f6106
reverted to single location
jacobecox May 14, 2026
5857aff
updated hostname in script
jacobecox May 14, 2026
902e261
added auth + database credentials and replication factor
jacobecox May 14, 2026
9438723
moved wait for repair system_auth
jacobecox May 14, 2026
16f289b
reset default values
jacobecox May 14, 2026
301c560
moved validation to helpers file, added repair cron job
jacobecox May 14, 2026
051292b
updated cron job
jacobecox May 14, 2026
5ee560a
added multizone
jacobecox May 15, 2026
b4e92bc
updated backup image and functionality with pgbouncer, added multizon…
jacobecox May 15, 2026
4b2f46a
init backup
jacobecox May 15, 2026
8354125
logical backup working state
jacobecox May 15, 2026
2d9d3da
physical backup added
jacobecox May 15, 2026
d17f81f
opened firewall for backup and restore on cassandra workload
jacobecox May 16, 2026
8e4cd97
added readme
jacobecox May 16, 2026
aa11a34
updated readme
jacobecox May 18, 2026
4328d15
changed createGvc to false
jacobecox May 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.DS_STORE
Chart.lock
charts/
.claude
Binary file added cassandra/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 17 additions & 0 deletions cassandra/versions/1.0.0/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: v2
name: cassandra
description: Cassandra cluster for Control Plane
type: application
version: 1.0.0
appVersion: "5.0"

annotations:
created: "2026-05-18"
lastModified: "2026-05-18"
category: "database"
createsGvc: false

dependencies:
- name: cpln-common
version: 1.0.0
repository: "oci://ghcr.io/controlplane-com/templates"
224 changes: 224 additions & 0 deletions cassandra/versions/1.0.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# Cassandra

This app deploys a Cassandra 5.0 cluster in a single location. Each node runs as a stateful replica with its own persistent volume, forming a peer-to-peer cluster that distributes and replicates data across nodes according to the configured replication factor. The template includes optional scheduled backups (logical or physical) and periodic anti-entropy repair.

## Architecture

- **Cassandra cluster**: Multi-node cluster deployed in a single location where each node owns a slice of the token ring and replicates data to peers
- **Per-node volumes**: Each node gets its own persistent volume so SSTable data survives restarts
- **Repair** (optional): Scheduled cron job that runs `nodetool repair` across all nodes to keep data consistent
- **Backup** (optional): Logical (`cqlsh COPY TO`) or physical (`nodetool snapshot`) backup to S3 or GCS

## Configuration

### Core Settings

```yaml
replicas: 3 # Number of Cassandra nodes
replicationFactor: 3 # Copies of each partition stored across the cluster
# Must not exceed replicas

superuserPassword: supersecretpassword # Built-in cassandra superuser password
username: username # Application user
password: password # Application user password
keyspaceName: mydatabase # Keyspace created on startup

image: cassandra:5.0
cpu: 1
memory: 4Gi
jvmHeapSize: 2G # Set to ~50% of memory — Cassandra needs the rest for off-heap cache
clusterName: my-cassandra
```

**Volume** — set the initial storage capacity and optionally enable autoscaling:

```yaml
volumes:
data:
initialCapacity: 10 # GiB
autoscaling:
maxCapacity: 100
minFreePercentage: 20
scalingFactor: 1.5
```

Configure which workloads can reach Cassandra:

```yaml
internal_access:
type: same-gvc # Options: same-gvc, same-org, workload-list
workloads:
# Uncomment and specify workloads if using workload-list
#- //gvc/GVC_NAME/workload/WORKLOAD_NAME
```

- `same-gvc`: Allow access from all workloads in the same GVC
- `same-org`: Allow access from all workloads in the org
- `workload-list`: Allow access only from specified workloads

## Connecting

Each Cassandra replica is reachable via its own DNS name:

```
Host: {release-name}-cassandra-{n}.{gvc}.cpln.local
Port: 9042 (CQL, native transport)
Username: {username}
Password: {password}
Keyspace: {keyspaceName}
```

Provide multiple node hostnames as contact points in your application so it can discover the full cluster topology.

## Replicas vs Replication Factor

These are two separate settings that work together:

- **`replicas`** — how many Cassandra nodes are deployed. More nodes means more capacity and better throughput, as the token ring is split across more nodes.
- **`replicationFactor`** — how many copies of each partition are stored across the cluster. A replication factor of 3 means every row exists on 3 different nodes, so the cluster can survive 2 node failures without data loss (with `QUORUM` consistency).

`replicationFactor` must not exceed `replicas` — you cannot store 3 copies of data across only 2 nodes.

## Multi-Zone

When `multiZone.enabled: true`, Control Plane spreads replicas across availability zones within the location:

```yaml
multiZone:
enabled: true
```

With a replication factor of 3 across 3 zones, each zone holds one copy of every partition. The cluster survives a complete zone outage with no data loss, provided your client uses `LOCAL_QUORUM` consistency (reads and writes succeed with responses from the surviving 2 zones).

Verify your selected location supports multi-zone before enabling this option.

## Repair

Cassandra uses eventual consistency — when nodes miss writes during downtime, data can drift out of sync. `nodetool repair` runs an anti-entropy process that compares and reconciles data across all replicas. Repair must complete across all nodes at least once within `gc_grace_seconds` (default: 10 days) to prevent deleted data from reappearing.

The template includes a scheduled repair cron job:

```yaml
repair:
enabled: true
schedule: "0 2 * * 0" # Weekly, Sunday at 2am UTC
```

The default weekly schedule satisfies the 10-day `gc_grace_seconds` requirement with margin. Do not disable repair in production or increase the interval beyond 10 days.

Repair can be resource-intensive on large datasets. If it impacts query performance, consider running it during low-traffic windows or increasing node resources.

## Backing Up

Two backup modes are available:

- **Logical** — exports keyspace tables as CSVs using `cqlsh COPY TO`, then uploads to cloud storage. Runs as a standalone cron workload on schedule. Suitable for smaller datasets or when portability matters.
- **Physical** — creates SSTable snapshots using `nodetool snapshot` and syncs them to cloud storage. Runs as a sidecar container on each Cassandra replica. Faster and more space-efficient for large datasets, but backups are per-node and must be restored node-by-node.

Set `backup.enabled: true`, choose a `type`, set `backup.provider`, and fill in the corresponding cloud block:

```yaml
backup:
enabled: true
type: logical # logical or physical
image: ghcr.io/controlplane-com/backup-images/cassandra-backup:5.0
schedule: "0 2 * * *" # daily at 2am UTC

resources:
cpu: 250m
memory: 256Mi

provider: aws # aws or gcp

aws:
bucket: my-backup-bucket
region: us-east-1
cloudAccountName: my-backup-cloudaccount
policyName: my-s3-policy
prefix: cassandra/backups

gcp:
bucket: my-backup-bucket
cloudAccountName: my-cloud-account
prefix: cassandra/backups
```

### AWS S3

1. Create your S3 bucket. Set `aws.bucket` and `aws.region` to match.

2. If you do not have a Cloud Account set up, refer to the docs to [Create a Cloud Account](https://docs.controlplane.com/guides/create-cloud-account). Set `aws.cloudAccountName` to match.

3. Create an AWS IAM policy with the following JSON (replace `YOUR_BUCKET_NAME`):

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetObjectVersion",
"s3:DeleteObjectVersion"
],
"Resource": [
"arn:aws:s3:::YOUR_BUCKET_NAME",
"arn:aws:s3:::YOUR_BUCKET_NAME/*"
]
}
]
}
```

4. Set `aws.policyName` to the name of the policy created in step 3.

### GCS

1. Create your GCS bucket. Set `gcp.bucket` to match.

2. If you do not have a Cloud Account set up, refer to the docs to [Create a Cloud Account](https://docs.controlplane.com/guides/create-cloud-account). Set `gcp.cloudAccountName` to match.

**Important**: Add the `Storage Admin` role to the GCP service account created for the Cloud Account.

## Restoring a Backup

### Logical Restore

Exec into the backup cron workload and run `restore.sh` with the timestamp of the backup you want to restore:

```bash
RESTORE_TIMESTAMP=2026-05-15T02-00-00Z /usr/local/bin/restore.sh
```

The timestamp format matches the backup filename in your bucket (e.g. `cassandra/backups/2026-05-15T02-00-00Z/`).

The script downloads the CSVs for the configured keyspace and replays them into Cassandra using `cqlsh COPY FROM`. Existing rows with matching primary keys are overwritten; rows not in the backup are left in place.

### Physical Restore

Physical backups are per-node — each replica backed up its own SSTable slice. To restore, exec into the **backup sidecar container** (not the cassandra container) on each replica that needs to be restored and run:

```bash
RESTORE_TIMESTAMP=2026-05-15T02-00-00Z /usr/local/bin/restore.sh
```

The script downloads the snapshot files for that replica from `{prefix}/{timestamp}/{hostname}/`, writes them to the shared volume, then calls `nodetool import` to load the SSTables into the live Cassandra instance without a restart.

**Important**: Repeat this on every replica. Because each node owns a different token range, restoring only one replica leaves the cluster with incomplete data.

## Important Notes

- **Minimum replicas for production**: Use at least 3 replicas with a replication factor of 3 so the cluster can survive a node failure while still achieving quorum
- **JVM heap**: Set `jvmHeapSize` to approximately 50% of `memory` — Cassandra relies heavily on off-heap memory for bloom filters, row cache, and OS page cache
- **gc_grace_seconds**: The default is 10 days. Ensure repair runs at least once within this window on all nodes, or deleted data may reappear after a node recovers from downtime
- **Scaling up**: Adding replicas after initial deployment does not automatically rebalance data. Run `nodetool rebuild` on new nodes and then `nodetool cleanup` on existing nodes after scaling
- **Multi-zone**: Verify your selected location supports multi-zone before enabling

## Supported External Services

- [Cassandra Documentation](https://cassandra.apache.org/doc/latest/)
- [Cassandra Driver Documentation](https://docs.datastax.com/en/developer/driver-matrix/doc/common/driverMatrix.html)
80 changes: 80 additions & 0 deletions cassandra/versions/1.0.0/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{{/* Resource Naming */}}

{{- define "cassandra.workload.name" -}}
{{- printf "%s-cassandra" .Release.Name }}
{{- end }}

{{- define "cassandra.secret.init.name" -}}
{{- printf "%s-cassandra-init" .Release.Name }}
{{- end }}

{{- define "cassandra.secret.config.name" -}}
{{- printf "%s-cassandra-config" .Release.Name }}
{{- end }}

{{- define "cassandra.identity.name" -}}
{{- printf "%s-cassandra-identity" .Release.Name }}
{{- end }}

{{- define "cassandra.policy.name" -}}
{{- printf "%s-cassandra-policy" .Release.Name }}
{{- end }}

{{- define "cassandra.volumeset.name" -}}
{{- printf "%s-cassandra-data" .Release.Name }}
{{- end }}

{{- define "cassandra.secret.credentials.name" -}}
{{- printf "%s-cassandra-credentials" .Release.Name }}
{{- end }}

{{- define "cassandra.workload.repair.name" -}}
{{- printf "%s-cassandra-repair" .Release.Name }}
{{- end }}

{{- define "cassandra.workload.backup.name" -}}
{{- printf "%s-cassandra-backup" .Release.Name }}
{{- end }}


{{/* Validation */}}

{{- define "cassandra.validate" -}}
{{- if gt (.Values.replicationFactor | int) (.Values.replicas | int) }}
{{- fail (printf "replicationFactor (%d) cannot exceed replicas (%d)" (.Values.replicationFactor | int) (.Values.replicas | int)) }}
{{- end }}
{{- if .Values.backup.enabled }}
{{- if not (or (eq .Values.backup.type "logical") (eq .Values.backup.type "physical")) }}
{{- fail (printf "backup.type must be 'logical' or 'physical', got: %s" .Values.backup.type) }}
{{- end }}
{{- if not (or (eq .Values.backup.provider "aws") (eq .Values.backup.provider "gcp")) }}
{{- fail (printf "backup.provider must be 'aws' or 'gcp', got: %s" .Values.backup.provider) }}
{{- end }}
{{- if eq .Values.backup.provider "aws" }}
{{- if not .Values.backup.aws.cloudAccountName }}
{{- fail "backup.aws.cloudAccountName is required when backup.provider is aws" }}
{{- end }}
{{- if not .Values.backup.aws.policyName }}
{{- fail "backup.aws.policyName is required when backup.provider is aws" }}
{{- end }}
{{- if not .Values.backup.aws.bucket }}
{{- fail "backup.aws.bucket is required when backup.provider is aws" }}
{{- end }}
{{- end }}
{{- if eq .Values.backup.provider "gcp" }}
{{- if not .Values.backup.gcp.cloudAccountName }}
{{- fail "backup.gcp.cloudAccountName is required when backup.provider is gcp" }}
{{- end }}
{{- if not .Values.backup.gcp.bucket }}
{{- fail "backup.gcp.bucket is required when backup.provider is gcp" }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}


{{/* Labeling */}}

{{- define "cassandra.tags" -}}
{{- include "cpln-common.tags" . }}
{{- end }}
22 changes: 22 additions & 0 deletions cassandra/versions/1.0.0/templates/identity.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
kind: identity
name: {{ include "cassandra.identity.name" . }}
description: {{ include "cassandra.workload.name" . }} identity
tags: {{- include "cassandra.tags" . | nindent 2 }}
{{- if and .Values.backup.enabled (eq .Values.backup.provider "aws") }}
aws:
cloudAccountLink: //cloudaccount/{{ .Values.backup.aws.cloudAccountName }}
policyRefs:
- cpln-connector
- aws::ReadOnlyAccess
- {{ .Values.backup.aws.policyName | quote }}
{{- end }}
{{- if and .Values.backup.enabled (eq .Values.backup.provider "gcp") }}
gcp:
bindings:
- resource: //storage.googleapis.com/projects/_/buckets/{{ .Values.backup.gcp.bucket }}
roles:
- roles/storage.objectAdmin
cloudAccountLink: //cloudaccount/{{ .Values.backup.gcp.cloudAccountName }}
scopes:
- https://www.googleapis.com/auth/cloud-platform
{{- end }}
13 changes: 13 additions & 0 deletions cassandra/versions/1.0.0/templates/policy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
kind: policy
name: {{ include "cassandra.policy.name" . }}
origin: default
bindings:
- permissions:
- reveal
principalLinks:
- //gvc/{{ .Values.global.cpln.gvc }}/identity/{{ include "cassandra.identity.name" . }}
targetKind: secret
targetLinks:
- //secret/{{ include "cassandra.secret.init.name" . }}
- //secret/{{ include "cassandra.secret.config.name" . }}
- //secret/{{ include "cassandra.secret.credentials.name" . }}
Loading
Loading