Skip to content
Kai Burjack edited this page Feb 22, 2026 · 11 revisions

What

This is the repository of k8s-httpcache, our replacement for the amazing kube-httpcache.

Why

Over the years, kube-httpcache served us very well. It has long been the only option to deploy a kubernetes-aware Varnish cluster, to be able to implement self-routed sharding/clustering.

However, we wanted some fixes and changes to a few aspects:

  1. support for zero-downtime rolling updates by using readiness/startup probes (see also: https://github.com/mittwald/kube-httpcache/issues/222)
  2. different template delimiters between Helm templating and kube-httpcache templating (otherwise making it necessary to "escape" Helm template curly braces to survive helm templating and be available for kube-httpcache in the VCL template)
  3. support multiple backend services
  4. provide a nicer CLI, where varnishd arguments can be directly specified without needing dedicated -varnish-storage, -varnish-transient-storage and -varnish-additional-parameters "wrapper" arguments (these arguments are only for varnishd so the controller should not get into the way and allow to directly specify them as is)
  5. support for Sprig template functions in VCL templates
  6. support for including values from ConfigMaps YAML files (watched or simple volume-mounted) in VCL templating
  7. connection draining during shutdown

The first point, in particular, is the most pressing. kube-httpcache never fully supported cluster/sharding in combination with startup/readiness probes. When you want cluster/sharding to increase the cache hit rate and reduce the number of backend requests and you also want to be zero-downtime without any 5xx errors or network connection errors experienced by clients during rolling updates or just HPA or node drain events, then you need both. You need sharding/clustering and startup/readiness probes.

How

The following VCL template provides an overview of how a VCL template for clustering/sharding could look like:

vcl 4.1;

import directors;
import std;

<<- range .Frontends >>
backend << .Name >> {
  .host = "<< .IP >>";
  .port = "<< .Port >>";
}
<<- end >>

<<- range $name, $eps := .Backends >>
<<- range $eps >>
backend << .Name >>_<< $name >> {
  .host = "<< .IP >>";
  .port = "<< .Port >>";
}
<<- end >>
<<- end >>

sub vcl_init {
  # Declare frontends (the varnish pods themselves) if we have one.
  # The only situation where we may not have one is if we are the only/first
  # pod starting up. In that case, we also do not need sharding.
  <<- if .Frontends >>
  new cluster = directors.shard();
  <<- range .Frontends >>
  cluster.add_backend(<< .Name >>);
  <<- end >>
  cluster.reconfigure();
  <<- end >>

  <<- range $name, $eps := .Backends >>
  new backend_<< $name >> = directors.round_robin();
  <<- range $eps >>
  backend_<< $name >>.add_backend(<< .Name >>_<< $name >>);
  <<- end >>
  <<- end >>
}

sub vcl_recv {
  # Add sharding if we have frontends.
  <<- if .Frontends >>
  # Protect from infinite redirect loops by checking a custom
  # "X-Shard-Routed" header.
  # This is necessary when varnish pods shut down and come up
  # e.g. during a rolling update or HPA (or node drain), since
  # not all controllers in all varnish pods have the same consistent
  # view of all frontends at every exact moment. They all individually
  # watch the Kubernetes API for changes and load the new VCL with
  # new frontends, eventually. But this is not atomic across all
  # controllers. So, one Varnish A with a different current set of
  # ready frontends can select Varnish B using the shard director
  # and Varnish B in turn can select A.
  if (!req.http.X-Shard-Routed) {
    set req.backend_hint = cluster.backend(by=URL);
    set req.http.x-shard = req.backend_hint;
    if (req.http.x-shard != server.identity) {
      set req.http.X-Shard-Routed = "true";
      return(pass);
    }
  }
  <<- end >>

  # Example of routing to different backends:
  <<- range $name, $_ := .Backends >>
  if (req.url ~ "^/<< $name >>/") {
    set req.backend_hint = backend_<< $name >>.backend();
  }
  <<- end >>
}

This can then be used in a simple Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: k8s-httpcache
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: k8s-httpcache
  template:
    metadata:
      labels:
        app: k8s-httpcache
    spec:
      serviceAccountName: k8s-httpcache
      terminationGracePeriodSeconds: 90
      containers:
      - name: k8s-httpcache
        image: k8s-httpcache:test
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          runAsUser: 1000 # <-- varnish user uses uid=1000 also in the container image
          runAsGroup: 1000 # <-- varnish user uses gid=1000 also in the container image
          runAsNonRoot: true
          readOnlyRootFilesystem: true
          capabilities:
            drop:
              - ALL
        args:
        - --service-name=k8s-httpcache
        - --namespace=$(NAMESPACE)
        - --vcl-template=/etc/k8s-httpcache/vcl.tmpl
        - --backend=service-a:service-a
        - --backend=service-b:service-b
        - --drain
        - --drain-delay=15s
        - --drain-timeout=30s
        - --
        - -s
        - default,100M
        - -t
        - 5s
        - -p
        - default_grace=0s
        - -p
        - default_keep=0s
        - -p
        - timeout_idle=75s
        - -p
        - backend_idle_timeout=5s
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        startupProbe:
          httpGet:
            path: /ready
            port: http
          failureThreshold: 30
          periodSeconds: 1
        volumeMounts:
        - name: vcl-template
          mountPath: /etc/k8s-httpcache
          readOnly: true
        - name: tmp
          mountPath: /tmp
        - name: varlibvarnish
          mountPath: /var/lib/varnish
        resources:
          requests:
            cpu: 10m
            memory: 1Gi
          limits:
            cpu: "1"
            memory: 1Gi
      volumes:
      - name: tmp
        emptyDir:
          medium: Memory
      - name: varlibvarnish
        emptyDir:
          medium: Memory
      - name: vcl-template
        configMap:
          name: k8s-httpcache-vcl

Clone this wiki locally