Skip to content

Reject storage names that collide with varnishd's implicit default (s0) #44

@jensens

Description

@jensens

Problem

The stock `varnish:7.x` Docker image entrypoint appends its own unnamed `-s malloc,...` to varnishd's argv unless the user supplies one. Unnamed `-s` flags are auto-named by varnishd as `s0`, `s1`, ... in order.

When a `VinylCache` user configures `spec.storage[].name: s0` — the obvious choice, and the one our README/how-to examples in cdk8s-plone suggested — the operator emits:

```
-s s0=malloc,
```

Combined with the entrypoint's unnamed default (also auto-named `s0`), varnishd fails at startup:

```
Error: (-s s0=malloc,1500000000) 's0' is already defined
(-? gives usage)
```

The pod goes `CrashLoopBackOff` on the next StatefulSet rollout. Because `OrderedReady` blocks replacement, the previous pod (which pre-dates the storage change and still runs with the old argv) stays up, so the degradation can go unnoticed — VCL appears synced, traffic still serves from one replica, but HA is gone.

Impact in prod

Hit a VinylCache in aaf-6 prod yesterday: pod-1 has been `CrashLoopBackOff` for 30h+, pod-0 kept serving from a stale revision. `kubectl get vinylcache` shows `Ready=False` + `Progressing` + `WaitingForReplicas` but status.conditions[VCLSynced]=True, which is misleading — the VCL IS synced to the one old-revision pod that's still up.

Proposal

Reject reserved varnishd storage names in the admission webhook:

```go
var reservedStorageNames = map[string]struct{}{
"s0": {}, // entrypoint's implicit default
"Transient": {}, // varnishd built-in transient storage
}
```

Ideally reject on `spec.storage[].name` at `Create` and `Update`. Reason string e.g.:

storage name %q is reserved: varnishd uses it for implicit default storage; choose another name (e.g. "cache", "main")

Alternatively (heavier) — the operator could detect and strip the entrypoint's default by overriding `command` with `["/usr/sbin/varnishd"]` instead of letting the entrypoint run, removing the implicit `-s` source. That's invasive and breaks BYO-image compatibility, so webhook validation is the pragmatic option.

Docs angle

Orthogonal but useful: add a one-line guardrail in the CRD field doc for `spec.storage[].name` warning against `s0` / `Transient`. Even with the webhook check, the hint appears in `kubectl explain vinylcache.spec.storage.name` and in generated docs.

Test plan

  • Unit: webhook rejects `{name: "s0"}` and `{name: "Transient"}` at Create and Update.
  • Unit: webhook accepts `{name: "cache"}` / `{name: "mem"}` / `{name: "s1"}`.
  • E2E (optional): deploy a VinylCache with `storage: [{name: s0, ...}]` against a real cluster and assert `kubectl apply` fails at the admission step rather than `CrashLoopBackOff`-ing later.

Context

cdk8s-plone's existing How-To and docstrings use `s0` as the example name (bluedynamics/cdk8s-plone#148). I'll update those to use `cache` once this is fixed upstream, so the two repos stay in sync.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions