Zero-packet-drop live migration for Kata Containers.
make # builds all katamaran binaries into bin/
# Destination node (run first)
sudo ./bin/katamaran --mode dest --qmp /run/vc/vm/<id>/extra-monitor.sock --tap tap0_kata
# Source node
sudo ./bin/katamaran --mode source --qmp /run/vc/vm/<id>/extra-monitor.sock \
--dest-ip <dest-node-ip> --vm-ip <pod-ip>Three-phase migration: storage (NBD drive-mirror) → compute (RAM pre-copy) → network (IPIP/GRE tunnel + sch_plug qdisc). Packets arriving during the VM pause are buffered and flushed on resume — zero drops. Add --shared-storage with Ceph/NFS to skip the storage phase entirely.
Supports both local storage (NBD drive-mirror) and shared storage (Ceph, NFS — skip mirroring with --shared-storage).
Traditional QEMU live migration assumes shared storage. In Kubernetes with Kata Containers, pods typically use local virtio-blk disks — meaning the entire block device must be migrated alongside RAM and network state. katamaran orchestrates all three phases in the correct order while guaranteeing zero in-flight packet drops during the cutover.
Like a catamaran glides between two hulls, katamaran glides your VM between two nodes — smoothly, with nothing lost overboard.
- Getting Started
- Architecture Overview
- Prerequisites
- Project Structure
- Usage
- Why Sequential Pre-Copy?
- Kubernetes Integration
- Dashboard
- Testing
See also: Installation Guide · Usage Guide · Testing Guide · User Stories · Dashboard
This section walks you through building katamaran, setting up a two-node cluster with Kata Containers, and running your first live migration — step by step.
In addition to the runtime prerequisites (QEMU 6.2+, Kata 3.x, iproute2, Go 1.26+), the tutorial requires:
- Linux host with KVM (
/dev/kvmmust exist) minikube,kubectl,helminstalledpodman(or Docker)- ~30 GB free disk, ~20 GB free RAM (for two KVM nodes)
Verify KVM and nested virtualization:
ls /dev/kvm # must exist
cat /sys/module/kvm_intel/parameters/nested # Y or 1 (Intel)
cat /sys/module/kvm_amd/parameters/nested # 1 (AMD)git clone https://github.com/maci0/katamaran.git
cd katamaran
make
./bin/katamaran --helpRun the smoke tests (no VMs required):
make smoke # validates compilation, CLI behavior, project structureminikube start -p katamaran-demo \
--nodes 2 \
--driver=kvm2 \
--memory=8192 \
--cpus=4 \
--container-runtime=containerd \
--cni=calico
# Wait for both nodes to be ready
kubectl wait --for=condition=Ready node --all --timeout=120shelm upgrade --install kata-deploy \
oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy \
--version 3.27.0 \
--namespace kube-system \
--create-namespace \
--set shims.disableAll=true \
--set shims.qemu.enabled=true \
--wait=false
# Wait for kata-deploy to finish installing on both nodes
kubectl -n kube-system rollout status daemonset/kata-deploy --timeout=600s
# Verify the kata-qemu RuntimeClass exists
kubectl get runtimeclass kata-qemuBuild the container image and deploy via DaemonSet. With the Kata 3.27 layout shown above, this installs the katamaran binary, enables the Kata QMP extra-monitor socket, and loads the required kernel modules (ipip, ip6_tunnel, ip_gre, ip6_gre, sch_plug) on both nodes:
make image
minikube -p katamaran-demo image load katamaran.tar
kubectl apply -f deploy/daemonset.yaml
kubectl -n kube-system rollout status daemonset/katamaran-deploy --timeout=120sInstead of relying on an automated black-box script, let's deploy a Kata VM pod manually using the provided demo manifest:
kubectl apply -f demo/nginx-kata.yamlWait for the pod to become ready:
kubectl wait --for=condition=Ready pod/nginx-kata --timeout=60sThe manifest includes a NodePort service on port 30081. You can test reaching the NGINX container directly from your host machine:
# Get the IP of the node
NODE_IP=$(kubectl get node katamaran-demo -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
curl http://$NODE_IP:30081(Note: If you are using Docker Desktop on macOS/Windows, or if your hypervisor NAT doesn't bridge NodePorts directly, you can use minikube service nginx-service --url to create a localhost tunnel instead).
To orchestrate the migration, katamaran uses two Kubernetes Jobs — one on the destination node and one on the source node. You apply these manifests manually, passing the required state through environment variables:
export VM_IP=$(kubectl get pod nginx-kata -o jsonpath='{.status.podIP}')
export DEST_IP=$(kubectl get node katamaran-demo-m02 -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
export IMAGE="localhost/katamaran:dev"
export EXTRA_ARGS=""
# Find the QMP sockets using the container runtime (crictl works for containerd/cri-o):
# SRC_ID=$(minikube ssh -p katamaran-demo -n katamaran-demo -- sudo crictl pods --name nginx-kata -q)
# DEST_ID=$(minikube ssh -p katamaran-demo -n katamaran-demo-m02 -- sudo crictl pods --name nginx-kata-dest -q)
# Apply the destination job first (it waits for incoming migration)
export NODE_NAME="katamaran-demo-m02"
export QMP_SOCKET="/run/vc/vm/<dest-id>/extra-monitor.sock"
envsubst < deploy/job-dest.yaml | kubectl apply -f -
# Then apply the source job to initiate the migration
export NODE_NAME="katamaran-demo"
export QMP_SOCKET="/run/vc/vm/<src-id>/extra-monitor.sock"
envsubst < deploy/job-source.yaml | kubectl apply -f -Warning
This manual kubectl apply will fail out-of-the-box.
Live migrating a Kata Container requires a destination QEMU process waiting with the exact same UUID, vsock, and hardware state as the source, started with the -incoming flag.
In a production cluster, a dedicated Kubernetes Operator automates this complex state-matching phase before applying the katamaran jobs.
To see the full end-to-end flow working in your local terminal (including the automated state-matching wrapper), use the provided E2E script:
./scripts/e2e.sh --provider minikube --ping-proofTip: For a faster setup (~30s instead of ~5min), use Kind + Podman instead of minikube:
./scripts/e2e.sh --provider kind --ping-proof ./scripts/e2e.sh --teardown --provider kind
Migration proceeds in three sequential phases:
sequenceDiagram
participant S as Source Node
participant D as Destination Node
rect rgb(59, 130, 246, 0.1)
Note over S,D: Phase 1 — Storage Mirroring
S->>D: NBD drive-mirror (background sync)
D-->>S: Block device ready
Note over S: VM keeps running
end
rect rgb(16, 185, 129, 0.1)
Note over S,D: Phase 2 — Compute Migration
S->>D: RAM pre-copy (TCP)
Note over S: VM pauses (STOP)
Note over D: VM resumes (RESUME)
end
rect rgb(245, 158, 11, 0.1)
Note over S,D: Phase 3 — Network Cutover
S->>D: IPIP/GRE tunnel (redirect traffic)
Note over D: sch_plug unplug (flush buffered pkts)
end
The destination QEMU starts an NBD server exporting the target block device. The source issues a drive-mirror QMP command that copies every block to the remote NBD target in the background while the VM keeps running. Dirty blocks are re-synced continuously until the mirror reports ready (fully synchronized).
Once storage is synchronized, the source starts standard QEMU RAM pre-copy migration (migrate QMP command) with auto-converge enabled. QEMU iteratively copies dirty RAM pages while the VM continues to run.
To achieve true "zero downtime" perception, katamaran configures QEMU with a 25 ms default downtime limit and uncaps the migration bandwidth to 10 GB/s. The downtime limit is configurable with --downtime or can be derived from RTT with --auto-downtime; QEMU keeps iterating until the remaining dirty RAM can be transferred within that budget.
Once the remaining dirty RAM set is small enough to transfer within the configured downtime budget, the VM pauses (emitting the STOP event). At this very last bit, QEMU performs a final incremental copy of the remaining dirty RAM pages and device state. Only after this final copy completes does the destination VM resume (emitting the RESUME event).
The critical downtime window — between STOP on the source and RESUME on the destination — is where packets would normally be lost. katamaran eliminates this:
- Source side: Immediately after
STOP, an IP tunnel is created pointing at the destination node. The tunnel encapsulation is selected by--tunnel-mode: with the defaultipip, an IPIP tunnel is used for IPv4 (mode ipip) and an ip6tnl tunnel for IPv6 (mode ip6ip6); withgre, a GRE tunnel is used for IPv4 (mode gre) and an ip6gre tunnel for IPv6. GRE is recommended on cloud VPCs (AWS, GCP, Azure) where IPIP (IP protocol 4/41) is often blocked by security groups, while GRE (IP protocol 47) is widely permitted. A host route for the VM IP is added through the tunnel, forwarding any packets that arrive at the (now stale) source to the destination. - Destination side: A
tc sch_plugqdisc on the destination tap interface buffers all arriving packets (including those forwarded through the tunnel). The qdisc is installed in pass-through mode (release_indefinite) and switched to buffering (block) before waiting for RESUME. When the VM resumes, the queue is unplugged withrelease_indefinite, flushing all buffered packets into the now-running VM in order. QEMU'sannounce-selfQMP command then broadcasts Gratuitous ARP using the guest's actual MAC address, ensuring switches learn the correct port binding immediately.
The result: packets that arrive during the switchover are queued, not dropped. After the CNI control plane converges (seconds later), new traffic flows directly to the destination and the tunnel is torn down.
To ensure absolute safety during orchestration, katamaran implements strict concurrency constraints designed to avoid race conditions and resource leaks:
- Context Cancellation Trade-offs: When the main context cancels (e.g. from
SIGINTor a timeout),katamarandoes not immediately close the QMP connection. Instead, it usescontext.AfterFuncto shorten the socket deadline, cleanly interrupting any blocking reads without causing a data race. This keeps the QMP connection alive just long enough to execute deferred cleanup commands (likemigrate-cancelorblock-job-cancel) before exit. - Cancellation-Detached Cleanup: Operations running in
deferblocks usecontext.WithoutCancel. This detaches the cleanup step from the main cancellation tree (so it isn't instantly aborted) but preserves critical values like logging traces or metrics attached to the original context. - Sequential Polling vs Concurrent Races: Instead of spawning background goroutines that listen for asynchronous QMP events while concurrently polling status endpoints,
katamaranexecutes a unified sequential polling loop. This explicitly eliminates the risk of concurrent state access issues across all migration phases, avoiding missedSTOPevents or silent QEMU failures.
| Component | Minimum Version | Notes |
|---|---|---|
| QEMU | 6.2+ | Must support drive-mirror, nbd-server-start, announce-self, QMP |
| Kata Containers | 3.x | QMP socket must be accessible |
| iproute2 | any | tc (sch_plug qdisc) + ip tunnel (IPIP/GRE/ip6tnl/ip6gre) |
| Go | 1.26+ | Install system-wide |
For CNI compatibility details (OVN-Kubernetes, Cilium, Calico, Flannel, and others), see Networking: CNI Compatibility under Kubernetes Integration.
go.mod # Go module (github.com/maci0/katamaran)
Makefile # Build, test, fuzz, image targets
Dockerfile # Multi-stage container image build
Dockerfile.dashboard # Dashboard container image build
.dockerignore # Build context exclusions
.github/
dependabot.yml # Dependabot config (GitHub Actions + Go modules, weekly)
workflows/
ci.yml # GitHub Actions CI (lint, test, fuzz seeds, build, Docker)
cmd/
katamaran/
main.go # CLI entry point — flag parsing and dispatch
main_test.go # CLI validation and flag behavior tests
dashboard/
main.go # Dashboard CLI wrapper
index.html # Dashboard frontend (dark theme, Chart.js)
README.md # Dashboard usage guide
internal/
dashboard/
server.go # Dashboard web server and route table
server_test.go # Dashboard endpoint, middleware, and security tests
loadgen.go # Ping and HTTP load generator handlers
metrics.go # expvar counters and duration buckets
middleware.go # HTTP middleware (logging, recovery, CSRF, security headers)
migrate.go # Migration orchestration handler
types.go # Dashboard state and API response types
validate.go # Input validation and SSRF prevention
logging/
logging.go # Logging setup helpers (SetupLogger)
logging_test.go # Logging tests
migration/
config.go # Config types, constants, and helpers (validators, formatQEMUHost)
config_test.go # Config unit tests and FuzzFormatQEMUHost
dest.go # Destination-side migration logic
dest_test.go # Destination unit tests
exec.go # External command execution (runCmd, runCmdInNetns)
exec_test.go # Exec unit tests
qmp_recording_test.go # QMP command recording helpers for migration tests
source.go # Source-side migration logic and polling
source_test.go # Source unit tests
tunnel.go # IP tunnel setup/teardown (IPIP/GRE/ip6ip6/ip6gre)
tunnel_test.go # Tunnel unit tests
qmp/
client.go # QMP client (connect, execute, wait for events)
client_test.go # QMP client unit tests
fuzz_test.go # Fuzz tests for QMP protocol parsing (6 targets)
types.go # QMP protocol types and command argument structs
qmptest/
qmptest.go # Shared test helpers for faking a QMP server
buildinfo/
buildinfo.go # Build version variable (overridden via ldflags)
deploy/
dashboard.yaml # Dashboard Kubernetes Deployment + ClusterIP Service
daemonset.yaml # DaemonSet for node setup (binary, kernel modules, QMP config when present)
job-dest.yaml # Job template for destination-side migration
job-source.yaml # Job template for source-side migration
migrate.sh # Orchestration wrapper for Job-based migration
docs/
INSTALL.md # Installation guide (binary, container, DaemonSet)
USAGE.md # Usage guide (CLI and Kubernetes Jobs)
TESTING.md # Test environment guide
STORIES.md # User stories
logo.png # Project logo
demo/
nginx-kata.yaml # Example Kata Containers pod with NGINX + NodePort
scripts/ # Test and operational scripts
test.sh # Smoke tests (no VMs required)
cleanup.sh # Cluster cleanup helper
minikube-test.sh # Single-node Kata QMP smoke test (requires KVM)
e2e.sh # Unified E2E live migration test harness
sweep.sh # Parameter sweep tool for migration tuning
lib.sh # Shared utility functions for scripts
build-minikube-iso.sh # Custom minikube ISO builder
build-minikube-modules.sh # Kernel module builder for minikube
manifests/ # E2E test manifests and templates
kata-pod.yaml # Kata Containers pod template
kind-config.yaml # Kind cluster configuration
kind-config-nocni.yaml # Kind cluster configuration (CNI disabled for Cilium/Flannel)
nfs-pv.yaml # NFS PersistentVolume template
nfs-server.yaml # NFS server pod template
pod-src.yaml # Source pod manifest for E2E tests
pod-dest.yaml # Destination pod manifest for E2E tests
katamaran provides two modes (source and dest) to coordinate migration.
For full details on CLI flags, direct usage, shared storage mode, IPv6, Cloud VPC configuration, and Kubernetes Job-based orchestration, please see the Usage Guide.
A natural question: why not mirror storage and RAM in parallel?
The drive-mirror operation generates substantial network I/O — it copies the entire block device (often tens of GB) over the wire. Running RAM pre-copy simultaneously would cause two problems:
- Buffer overflow on the network path. Both streams compete for bandwidth. RAM pre-copy is latency-sensitive (dirty pages must be re-sent each round). When storage mirroring saturates the link, RAM rounds take longer, more pages get re-dirtied, and convergence stalls — or the migration fails entirely.
- Wasted bandwidth from redundant RAM retransmission. While storage is still syncing, the VM keeps running and dirtying RAM. Each pre-copy round re-sends those dirty pages. If storage sync takes 5 minutes, that's 5 minutes of RAM rounds that will largely be invalidated. By waiting for storage to reach
ready, we start RAM pre-copy on a quiet network with the shortest possible convergence path.
The sequential approach — storage first, then RAM — minimizes total migration time and keeps the final downtime window (the STOP→RESUME gap) as short as possible.
katamaran is designed as a low-level migration primitive. In a production Kubernetes cluster, it would be invoked by a higher-level controller (e.g., a CRD operator) that orchestrates the full lifecycle: selecting a target node, preparing the destination VM, invoking katamaran on both sides, and updating Kubernetes state afterward.
This section explores which storage and networking stacks are compatible, what the ideal setup looks like, and the open integration points.
flowchart LR
A[Build image localhost/katamaran:dev] --> B[Load image into cluster]
B --> C[Apply deploy/daemonset.yaml]
C --> D["katamaran binary present on Kata nodes"]
D --> E[Create destination VM pod]
E --> F[Detect QMP socket + tap iface]
F --> G[Run deploy/migrate.sh]
G --> H[Render job-dest.yaml via envsubst]
H --> I[Wait dest job ready]
I --> J[Render job-source.yaml via envsubst]
J --> K[Wait source job complete]
K --> L[Collect logs + migration result]
The storage strategy depends on whether the cluster uses shared storage (both nodes see the same block device) or local storage (each node has its own disk).
| CSI Driver | Storage Type | katamaran Mode | Notes |
|---|---|---|---|
Ceph RBD (rbd.csi.ceph.com) |
Shared block | --shared-storage |
Ideal. Both nodes mount the same RBD image. No data transfer needed — only RAM + network state migrate. Requires ReadWriteMany or controlled handoff (unmap on source, map on dest). |
CephFS (cephfs.csi.ceph.com) |
Shared filesystem | --shared-storage |
Works if the VM's rootfs is on a CephFS-backed virtio-fs or virtiofs mount. Less common for block-level VM disks. |
NFS (nfs.csi.k8s.io) |
Shared filesystem | --shared-storage |
Simple but slower. NFS latency can affect VM disk I/O during and after migration. Acceptable for low-IOPS workloads. |
Longhorn (driver.longhorn.io) |
Replicated local | NBD drive-mirror | Longhorn volumes are node-local with network replication. katamaran mirrors the block device via NBD, then the Longhorn controller can adopt the replica on the destination. |
OpenEBS Mayastor (io.openebs.csi-mayastor) |
Replicated local | NBD drive-mirror or --shared-storage |
Mayastor NVMe-oF targets can be re-exported to the destination node, potentially allowing shared-storage mode. Otherwise, NBD drive-mirror works. |
TopoLVM (topolvm.io) |
Strict local | NBD drive-mirror | Purely local LVM. The entire block device must be mirrored. Best for small disks or infrequent migrations. |
| Local Path Provisioner | Strict local | NBD drive-mirror | No replication. Full block copy required. Suitable for dev/test. |
With Ceph RBD, migration skips the most time-consuming phase entirely. The flow becomes:
sequenceDiagram
participant S as Source Node
participant RBD as Ceph RBD
participant D as Destination Node
D->>RBD: Open same RBD image (read-only)
Note over D: katamaran --mode dest --shared-storage
D->>D: Install qdisc, wait for RESUME
Note over S: katamaran --mode source --shared-storage
S->>D: RAM pre-copy only (no storage mirror)
Note over S: VM pauses (STOP)
Note over D: VM resumes (RESUME)
S-->>RBD: Unmap image
D->>RBD: Promote to read-write
S->>D: IPIP tunnel + sch_plug flush
Total migration time is dominated by RAM pre-copy convergence — typically seconds for a 4 GB VM with moderate dirty page rate.
With Longhorn or local disks, all three phases run:
sequenceDiagram
participant S as Source Node
participant D as Destination Node
rect rgb(59, 130, 246, 0.1)
Note over S,D: Phase 1 — minutes to hours (scales with disk size)
S->>D: NBD drive-mirror (entire block device)
Note over S: VM keeps running
end
rect rgb(16, 185, 129, 0.1)
Note over S,D: Phase 2 — seconds (after storage sync)
S->>D: RAM pre-copy
end
rect rgb(245, 158, 11, 0.1)
Note over S,D: Phase 3 — milliseconds
S->>D: Network cutover (tunnel + qdisc flush)
end
The NBD mirror runs in the background while the VM stays live, but total wall-clock time scales with disk size and write rate.
The network cutover (Phase 3) must work with the cluster's CNI plugin. The key requirement is that the VM's pod IP must remain reachable during the gap between source STOP and destination RESUME, plus the time for the CNI to update its routing/forwarding tables.
| CNI | Compatibility | How It Works | Convergence Time |
|---|---|---|---|
| OVN-Kubernetes | ★★★ Excellent | OVN's southbound DB updates the port-chassis binding. The logical switch port moves to the destination node automatically. GARP + OVN's own MAC learning provide near-instant convergence. Tested via e2e.sh --cni ovn. |
< 1s |
| Kube-OVN | ★★★ Excellent | Separate OVN-based CNI (by Alauda). Same port-chassis rebinding via OVN southbound DB. Additional features like subnets and VPCs. Not tested but expected to work identically. | < 1s |
| Cilium | ★★★ Excellent | eBPF datapath. After migration, the destination node's Cilium agent detects the new endpoint and installs eBPF maps. The IPIP tunnel covers the gap. Cilium's IPAM can be configured to preserve pod IPs across nodes with cluster-pool mode. |
1–3s |
| Calico | ★★☆ Good | BGP route propagation. The destination node advertises the pod IP via BGP. The IPIP tunnel bridges the gap until all peers converge. Calico IPAM must allow the pod IP to exist on the destination node (use --ipam=host-local with a shared pool, not per-node blocks). |
2–5s |
| Flannel | ★★☆ Good | VXLAN FDB entries. The destination node must update the VXLAN forwarding database. GARP handles L2, but Flannel's flanneld may take a few seconds to update FDB entries on all nodes. The IPIP tunnel covers the gap. |
2–5s |
| Antrea | ★★☆ Good | OVS-based. Similar to OVN-Kubernetes but with its own controller. Port migration requires the Antrea agent to update OVS flows on the destination. GARP + IP tunnel cover the gap. | 1–3s |
| Multus (meta-CNI) | Depends | Multus delegates to underlying CNIs. Compatibility depends on the primary and secondary CNI plugins. Each interface may need its own migration strategy. | Varies |
Important
The most critical requirement for network routing is that the VM's pod IP must survive migration.
This means the IPAM must allow the same IP to be assigned on the destination node. Per-node IP blocks (the default for several CNIs) are problematic because the pod IP belongs to the source node's CIDR and is invalid on the destination.
| CNI | Default IPAM | Migration-Safe Config |
|---|---|---|
| OVN-Kubernetes | Cluster-wide | Works out of the box — IPs are not tied to nodes |
| Kube-OVN | Cluster-wide | Works out of the box |
| Cilium | cluster-pool |
Works out of the box with default cluster-pool IPAM mode. Avoid kubernetes IPAM mode (per-node CIDRs) |
| Calico | Per-node blocks | Requires config change. Use ipipMode: Always with a shared IPPool and disable blockSize-based per-node allocation, or use Calico's IPAM with nat-outgoing: false and a flat pool |
| Flannel | Per-node /24 |
Problematic. Flannel allocates a /24 per node from the pod CIDR. No built-in mechanism to preserve IPs across nodes. Requires external IPAM or static IP annotation |
| Antrea | Per-node | Requires config change. Configure NodeIPAM with a shared pool or use externalIPAM |
| kindnet | Per-node | Not suitable for live migration (no cross-node IP support) |
For production live migration with minimal downtime and operational complexity:
┌──────────────┬──────────────────────────────────────────────┐
│ │ Ideal Stack │
├──────────────┼──────────────────────────────────────────────┤
│ Runtime │ Kata Containers 3.x + QEMU 8+ │
│ Storage CSI │ Ceph RBD (rbd.csi.ceph.com) │
│ Storage Mode │ --shared-storage (skip NBD mirror) │
│ CNI │ OVN-Kubernetes or Cilium │
│ IPAM │ Cluster-wide pool (not per-node) │
│ Kernel │ 5.15+ (sch_plug, IPIP, KVM) │
│ Network │ 25 Gbps+ node-to-node (for RAM pre-copy) │
│ Orchestrator │ CRD operator (manages lifecycle) │
└──────────────┴──────────────────────────────────────────────┘
Why this stack:
- Ceph RBD eliminates the storage mirroring phase entirely. Migration becomes RAM-only, completing in seconds instead of minutes.
- OVN-Kubernetes or Cilium provide the fastest network convergence. OVN's centralized southbound DB updates port bindings atomically. Cilium's eBPF datapath reconverges without waiting for BGP propagation.
- Cluster-wide IPAM ensures the pod IP is valid on any node, avoiding the per-node CIDR problem.
- 25 Gbps+ networking helps the final dirty page flush (the actual downtime-causing transfer) complete within the configured downtime budget.
A Kubernetes operator would manage migration as a CRD:
apiVersion: katamaran.io/v1alpha1
kind: VMMigration
metadata:
name: migrate-nginx-pod
spec:
podName: nginx-kata-7b4f8c
targetNode: worker-02
strategy: live
sharedStorage: true
timeout: 300sThe operator's reconciliation loop:
flowchart TD
A[VMMigration CR created] --> B{Validate target node}
B -->|capacity, runtime, kernel| C[Prepare destination pod]
C --> D[Get QMP socket paths]
D --> E["katamaran --mode dest (target node)"]
E --> F["katamaran --mode source (source node)"]
F --> G{Migration result}
G -->|Success| H[Patch pod nodeName]
H --> I[Update endpoint slices]
I --> J[Clean up source]
G -->|Failure| K[migrate-cancel]
K --> L[Resume source VM]
L --> M[Clean up destination]
- Pod checkpoint/restore: Should the operator snapshot the pod spec and container state for rollback?
- Multi-disk VMs: katamaran currently mirrors a single drive. Multi-disk setups would need parallel NBD mirrors or sequential mirroring.
- Live migration scheduling: Which node to pick? Factors: resource headroom, storage locality, network topology, anti-affinity rules.
- Preemption: Can a migration be preempted mid-flight if the destination node runs out of resources? This requires
migrate-cancelQMP support (already available in QEMU). - Encryption: NBD traffic and RAM migration traffic are currently unencrypted. For cross-rack or cross-AZ migration, WireGuard or IPsec tunnels should wrap the migration streams.
- Observability: Exposing migration progress (storage sync %, RAM dirty rate, downtime duration) as Prometheus metrics via the operator.
A web UI for orchestrating migrations, visualizing ping latency (zero-drop proof), and running HTTP load generators during cutover. Includes a pod-picker that auto-discovers kata-qemu pods + nodes and a cmdline-replay mode that spawns the destination QEMU itself (no kata pod required on the dest node). See cmd/dashboard/README.md for the full UI flow + screenshots.
# 1. Build the image (from repository root)
make dashboard
# 2. Load it into your cluster (if using minikube/kind)
minikube image load dashboard.tar
# 3. Deploy the manifests
kubectl apply -f deploy/dashboard.yamlOnce deployed, the dashboard is exposed via a ClusterIP service on port 8080.
- Access the UI: Run
kubectl port-forward -n kube-system svc/katamaran-dashboard 8080:8080and openhttp://localhost:8080. - Pick a Source Pod and Dest Node from the dropdowns (auto-populated from
/api/podsand/api/nodes). The hiddenvm_ipanddest_ipform fields auto-fill from the selection. For full zero-config dest spawning, also enablereplay_cmdline=true(recommended — see scripted example in the dashboard README). - Start Load Generation: Click ICMP Ping or HTTP Load. A live Chart.js graph plots latency.
- Migrate: Click Start Migration. The real-time log viewer streams the orchestrator's progress.
- Observe Zero-Drop: As the migration crosses the configured downtime window (25 ms by default), you will see a latency spike on the chart (representing the buffered packets) but zero dropped packets.
For manual override of any auto-derived value, expand the Advanced (override auto-discovery) disclosure in the form.
For GitOps / Argo / declarative workflows that prefer kubectl apply over a UI, katamaran ships a Migration Custom Resource and a small in-cluster controller (katamaran-mgr) that reconciles it through the same Native orchestrator the dashboard uses. Behaviour is identical — the only difference is the entry point.
# Build + load the controller image
make mgr
minikube image load mgr.tar
# Install the CRD + controller (one-time)
kubectl apply -f config/crd/migration.yaml
kubectl apply -f config/crd/manager.yaml
# Submit a migration
kubectl apply -f deploy/migration-example.yaml
# Watch the phase column
kubectl get migration -w
# NAME SOURCE DEST PHASE AGE
# demo-1 kata-demo kata-worker-b transferring 12s
# demo-1 kata-demo kata-worker-b succeeded 38sThe CR's .status carries the same migrationID, phase, startedAt, completedAt, and error fields that the dashboard surfaces — so external systems can wait on a Migration the same way they wait on a Job.
katamaran includes a comprehensive test suite ranging from native Go fuzzing to multi-node live migration tests proving zero packet drops across various CNIs (OVN-Kubernetes, Cilium, Calico, Flannel) and storage backends.
For instructions on running the test suite, verifying zero-drop behavior, and fuzzing the QMP protocol, please see the Testing Guide.

