Problem
The stock `varnish:7.x` Docker image entrypoint appends its own unnamed `-s malloc,...` to varnishd's argv unless the user supplies one. Unnamed `-s` flags are auto-named by varnishd as `s0`, `s1`, ... in order.
When a `VinylCache` user configures `spec.storage[].name: s0` — the obvious choice, and the one our README/how-to examples in cdk8s-plone suggested — the operator emits:
```
-s s0=malloc,
```
Combined with the entrypoint's unnamed default (also auto-named `s0`), varnishd fails at startup:
```
Error: (-s s0=malloc,1500000000) 's0' is already defined
(-? gives usage)
```
The pod goes `CrashLoopBackOff` on the next StatefulSet rollout. Because `OrderedReady` blocks replacement, the previous pod (which pre-dates the storage change and still runs with the old argv) stays up, so the degradation can go unnoticed — VCL appears synced, traffic still serves from one replica, but HA is gone.
Impact in prod
Hit a VinylCache in aaf-6 prod yesterday: pod-1 has been `CrashLoopBackOff` for 30h+, pod-0 kept serving from a stale revision. `kubectl get vinylcache` shows `Ready=False` + `Progressing` + `WaitingForReplicas` but status.conditions[VCLSynced]=True, which is misleading — the VCL IS synced to the one old-revision pod that's still up.
Proposal
Reject reserved varnishd storage names in the admission webhook:
```go
var reservedStorageNames = map[string]struct{}{
"s0": {}, // entrypoint's implicit default
"Transient": {}, // varnishd built-in transient storage
}
```
Ideally reject on `spec.storage[].name` at `Create` and `Update`. Reason string e.g.:
storage name %q is reserved: varnishd uses it for implicit default storage; choose another name (e.g. "cache", "main")
Alternatively (heavier) — the operator could detect and strip the entrypoint's default by overriding `command` with `["/usr/sbin/varnishd"]` instead of letting the entrypoint run, removing the implicit `-s` source. That's invasive and breaks BYO-image compatibility, so webhook validation is the pragmatic option.
Docs angle
Orthogonal but useful: add a one-line guardrail in the CRD field doc for `spec.storage[].name` warning against `s0` / `Transient`. Even with the webhook check, the hint appears in `kubectl explain vinylcache.spec.storage.name` and in generated docs.
Test plan
- Unit: webhook rejects `{name: "s0"}` and `{name: "Transient"}` at Create and Update.
- Unit: webhook accepts `{name: "cache"}` / `{name: "mem"}` / `{name: "s1"}`.
- E2E (optional): deploy a VinylCache with `storage: [{name: s0, ...}]` against a real cluster and assert `kubectl apply` fails at the admission step rather than `CrashLoopBackOff`-ing later.
Context
cdk8s-plone's existing How-To and docstrings use `s0` as the example name (bluedynamics/cdk8s-plone#148). I'll update those to use `cache` once this is fixed upstream, so the two repos stay in sync.
Problem
The stock `varnish:7.x` Docker image entrypoint appends its own unnamed `-s malloc,...` to varnishd's argv unless the user supplies one. Unnamed `-s` flags are auto-named by varnishd as `s0`, `s1`, ... in order.
When a `VinylCache` user configures `spec.storage[].name: s0` — the obvious choice, and the one our README/how-to examples in cdk8s-plone suggested — the operator emits:
```
-s s0=malloc,
```
Combined with the entrypoint's unnamed default (also auto-named `s0`), varnishd fails at startup:
```
Error: (-s s0=malloc,1500000000) 's0' is already defined
(-? gives usage)
```
The pod goes `CrashLoopBackOff` on the next StatefulSet rollout. Because `OrderedReady` blocks replacement, the previous pod (which pre-dates the storage change and still runs with the old argv) stays up, so the degradation can go unnoticed — VCL appears synced, traffic still serves from one replica, but HA is gone.
Impact in prod
Hit a VinylCache in aaf-6 prod yesterday: pod-1 has been `CrashLoopBackOff` for 30h+, pod-0 kept serving from a stale revision. `kubectl get vinylcache` shows `Ready=False` + `Progressing` + `WaitingForReplicas` but status.conditions[VCLSynced]=True, which is misleading — the VCL IS synced to the one old-revision pod that's still up.
Proposal
Reject reserved varnishd storage names in the admission webhook:
```go
var reservedStorageNames = map[string]struct{}{
"s0": {}, // entrypoint's implicit default
"Transient": {}, // varnishd built-in transient storage
}
```
Ideally reject on `spec.storage[].name` at `Create` and `Update`. Reason string e.g.:
Alternatively (heavier) — the operator could detect and strip the entrypoint's default by overriding `command` with `["/usr/sbin/varnishd"]` instead of letting the entrypoint run, removing the implicit `-s` source. That's invasive and breaks BYO-image compatibility, so webhook validation is the pragmatic option.
Docs angle
Orthogonal but useful: add a one-line guardrail in the CRD field doc for `spec.storage[].name` warning against `s0` / `Transient`. Even with the webhook check, the hint appears in `kubectl explain vinylcache.spec.storage.name` and in generated docs.
Test plan
Context
cdk8s-plone's existing How-To and docstrings use `s0` as the example name (bluedynamics/cdk8s-plone#148). I'll update those to use `cache` once this is fixed upstream, so the two repos stay in sync.