diff --git a/docs/guides/documentdb/_index.md b/docs/guides/documentdb/_index.md
new file mode 100644
index 000000000..1be3aeb0d
--- /dev/null
+++ b/docs/guides/documentdb/_index.md
@@ -0,0 +1,10 @@
+---
+title: DocumentDB
+menu:
+  docs_{{ .version }}:
+    identifier: dm-documentdb-guides
+    name: DocumentDB
+    parent: guides
+    weight: 10
+menu_name: docs_{{ .version }}
+---
diff --git a/docs/guides/documentdb/dr/_index.md b/docs/guides/documentdb/dr/_index.md
new file mode 100644
index 000000000..db521550a
--- /dev/null
+++ b/docs/guides/documentdb/dr/_index.md
@@ -0,0 +1,10 @@
+---
+title: Disaster Recovery
+menu:
+  docs_{{ .version }}:
+    identifier: guides-documentdb-dr
+    name: DR
+    parent: dm-documentdb-guides
+    weight: 36
+menu_name: docs_{{ .version }}
+---
diff --git a/docs/guides/documentdb/dr/guide/index.md b/docs/guides/documentdb/dr/guide/index.md
new file mode 100644
index 000000000..39555d83b
--- /dev/null
+++ b/docs/guides/documentdb/dr/guide/index.md
@@ -0,0 +1,379 @@
+---
+title: DC-DR User Guide
+menu:
+  docs_{{ .version }}:
+    identifier: guides-documentdb-dr-guide
+    name: User Guide
+    parent: guides-documentdb-dr
+    weight: 20
+menu_name: docs_{{ .version }}
+section_menu_id: guides
+---
+
+# Running DocumentDB in DC-DR Mode: User Guide
+
+This guide covers every aspect of operating a distributed DocumentDB in cross data
+center disaster recovery (DC-DR) mode: the components, the naming contract,
+deployment, connecting, monitoring, replication and lag, timing and tuning, quorum
+and roles, switchover and failback, scaling, day-2 operations, backup, and deletion.
+
+KubeDB `DocumentDB` is Microsoft DocumentDB (the `pg_documentdb` extension) on
+PostgreSQL under the hood, so DC-DR reuses the PostgreSQL WAL streaming, the per-DC
+`documentdb-coordinator` raft, and `pg_rewind` failback.
+
+Read the [DC-DR Overview](/docs/guides/documentdb/dr/overview/index.md)
+first for the architecture, and the
+[DC-DR Runbook](/docs/guides/documentdb/dr/runbook/index.md) for
+scenario-by-scenario procedures.
+
+> **New to KubeDB?** Please start [here](/docs/README.md).
+
+## Components and where they run
+
+| Component | Runs in | Responsibility |
+| --- | --- | --- |
+| **`dr-controlplane`** + 3-site etcd quorum | across the data centers (an OCM control plane) | Publishes one `coordination.k8s.io` **Lease** per failover scope. The Lease holder is the active (writable) DC. This is the single cross-DC failover authority. |
+| **`dr-controlplane` agent** | each spoke (DC) | Contends for the primary-DC Lease on behalf of its DC and projects the Lease decision into the local spoke as a marker `ConfigMap`. |
+| **KubeDB DocumentDB operator (hub)** | the OCM hub | Expands the `DocumentDB` CR into per-DC groups, watches the Lease, drives failover/switchover, and writes `status.disasterRecovery`. |
+| **`documentdb-coordinator`** | every DocumentDB pod | Runs the per-DC raft, reads the local marker, and fences its leader read-only when its DC is not active. |
+| **KubeSlice** | each spoke | Provides the cross-DC pod network so a standby DC's leader can stream from the active DC's leader. |
+
+The marker `ConfigMap` is the contract between the agent (producer) and the
+coordinator (consumer):
+
+```
+ConfigMap primary-dc  (namespace: dc-failover, on each spoke)
+  data.activeDC  = the DC the quorum currently trusts as primary
+  data.renewTime = RFC3339, the observed primary-DC Lease renewTime
+  data.quiesce   = the DC asked to hold read-only for a planned switchover (else empty)
+```
+
+The coordinator trusts the marker for 30s (the fence TTL); absent, stale,
+unparseable, or naming another DC all mean *not active* and the leader stays
+read-only. This is the fail-closed fence.
+
+## The DC-name contract
+
+One string identifies a data center everywhere. **Keep these identical:**
+
+- the OCM spoke cluster name
+- the agent `--dc-name`
+- the primary-DC Lease `holderIdentity`
+- the marker `data.activeDC`
+- the pod label `open-cluster-management.io/cluster-name`
+- the `PlacementPolicy` `distributionRule.clusterName`
+
+## Operator configuration
+
+Start the DocumentDB operator with:
+
+```
+--dc-dr-enabled
+--dc-dr-coord-kubeconfig=<kubeconfig of the coordination control plane>
+--dc-dr-local-dc=<the data center this operator instance runs in>
+```
+
+The per-DC pod coordinators automatically receive `DC_DR_ENABLED`, `DC_NAME`,
+`DC_DR_NAMESPACE` (default `dc-failover`), and `DC_DR_MARKER` (default `primary-dc`)
+through their PetSet template, so the fence works without extra wiring.
+
+## Deploying
+
+### PlacementPolicy
+
+Map the global pod ordinals to data centers and tag each DC with its role:
+
+```yaml
+apiVersion: apps.k8s.appscode.com/v1
+kind: PlacementPolicy
+metadata:
+  name: docdb-dcdr
+spec:
+  clusterSpreadConstraint:
+    slice:
+      projectNamespace: kubeslice-demo
+      sliceName: demo-slice
+    failoverPolicy:
+      trigger:
+        scope: Global       # one cluster-wide failover scope (or Group + a group name)
+      mode: TwoDC           # TwoDC: 2 Members + a tie-breaker; ThreeDC: 3 Members
+    distributionRules:
+    - clusterName: dc-east
+      role: Member
+      replicaIndices: [0, 1, 2]
+    - clusterName: dc-west
+      role: Member
+      replicaIndices: [3, 4, 5]
+    - clusterName: dc-arbiter
+      role: Arbiter
+```
+
+- A data-bearing **Member** rule carries `replicaIndices`; the **Arbiter** witness DC
+  (vote only, no DocumentDB) carries none. (The petset `Witness` role, a data-bearing
+  witness, is for engines like MongoDB and is not used by DocumentDB.)
+- `mode: TwoDC` expects exactly two Member DCs plus the Arbiter witness DC;
+  `ThreeDC` expects at least three Member DCs.
+
+### DocumentDB
+
+```yaml
+apiVersion: kubedb.com/v1alpha2
+kind: DocumentDB
+metadata:
+  name: docdb-dcdr
+  namespace: demo
+  annotations:
+    dr.kubedb.com/enabled: "true"          # opt into per-DC DC-DR expansion
+    # dr.kubedb.com/failover-group: payments  # optional: a Group failover scope
+    # dr.kubedb.com/switchover-max-lag-bytes: "16777216"  # optional lag budget override
+spec:
+  version: "pg17-0.109.0"
+  replicas: 6
+  distributed: true
+  storageType: Durable
+  podTemplate:
+    spec:
+      podPlacementPolicy:
+        name: docdb-dcdr
+  storage:
+    accessModes: [ReadWriteOnce]
+    resources:
+      requests:
+        storage: 1Gi
+  deletionPolicy: WipeOut
+```
+
+The cross-DC standby behavior follows the CR's `spec.standbyMode` (`Hot` or `Warm`)
+and `spec.streamingMode` (`Synchronous` or `Asynchronous`); the cross-DC links are
+asynchronous by design regardless. WAL retention and force-failover budgets come from
+`spec.replication` (`DocumentDBReplication`: `walLimitPolicy`, `walKeepSize`,
+`forceFailoverAcceptingDataLossAfter`), and per-DC raft elections from
+`spec.leaderElection` (`DocumentDBLeaderElectionConfig`).
+
+### What the operator creates
+
+Per data-bearing DC `<dc>`:
+
+- a per-DC `PetSet` `<db>-<dc>` (e.g. `docdb-dcdr-dc-east`) with its own intra-DC raft;
+- a DC-local headless governing `Service` so the DC's pods discover only each other;
+- a cluster-scoped per-DC `PlacementPolicy` `<base>-<dc>` pinning that group to the DC;
+- a per-DC arbiter `PetSet` `<db>-<dc>-arbiter` when that DC's local node count is even.
+
+The witness DC (`role: Arbiter`) runs no DocumentDB pods. All per-DC pods carry the offshoot selectors
+plus the `open-cluster-management.io/cluster-name` label, so the global primary/standby
+Services and the single `AppBinding` keep working.
+
+## Connecting
+
+A DC-DR DocumentDB exposes the same single endpoint as any KubeDB DocumentDB:
+
+- the **primary Service** `<db>` resolves to the active DC's writable leader (only
+  that leader is labeled `kubedb.com/role: primary`);
+- the **standby Service** `<db>-standby` resolves to the read-only leaders;
+- one **`AppBinding`** `<db>` for applications and KubeDB integrations.
+
+Because only the active DC's leader carries the `primary` label, the endpoint follows
+failover automatically, applications keep using `<db>` and reconnect after a
+failover, landing on the new active DC.
+
+## Monitoring and observability
+
+### status.disasterRecovery
+
+The single CR carries the whole cross-DC view:
+
+```bash
+$ kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{.status.disasterRecovery}' | jq
+```
+
+| Field | Meaning |
+| --- | --- |
+| `activeDC` | The DC that holds the Lease and runs the writable primary. |
+| `phase` | `Steady`, `FailingOver`, `FailingBack`, or `Degraded`. |
+| `lastTransitionTime` | When `activeDC` last changed. |
+| `dataCenters[].clusterName` | The data center, by its OCM managed cluster name. |
+| `dataCenters[].role` | `primary` for the active DC's leader, else `standby`. |
+| `dataCenters[].leader` | That DC's local raft leader pod. |
+| `dataCenters[].writable` | True only for the active DC. |
+| `dataCenters[].lagBytes` | The DC's cross-DC replication lag behind the active primary. |
+| `dataCenters[].healthy` | Whether the DC has a ready pod. |
+
+### Useful checks
+
+```bash
+# Which DC is active (from the coordination plane):
+$ kubectl --kubeconfig <coord> -n dc-failover get lease primary-dc \
+    -o jsonpath='{.spec.holderIdentity}'
+
+# The marker each spoke reads (run against a spoke):
+$ kubectl -n dc-failover get configmap primary-dc -o yaml
+
+# Per-DC leaders and roles:
+$ kubectl get pods -n demo -l app.kubernetes.io/instance=docdb-dcdr \
+    -L kubedb.com/role,open-cluster-management.io/cluster-name
+
+# A standby DC leader stamps its lag here:
+$ kubectl get pod -n demo <leader-pod> -o jsonpath='{.metadata.annotations.kubedb\.com/dc-lag-bytes}'
+```
+
+## Replication, lag, and RPO
+
+- Cross-DC replication is **asynchronous** leader-to-leader WAL streaming. Within a
+  standby DC, the local followers **cascade** from their DC's leader, so each standby
+  DC opens exactly one cross-DC link.
+- `lagBytes` is how far a DC's leader is behind the active primary, computed by that
+  DC's coordinator (the hub never opens cross-cluster SQL). It is the basis for the
+  RPO of an unplanned failover.
+- A **planned switchover loses no committed rows** (zero RPO) because writes are
+  frozen and the target fully catches up before the handoff. An **unplanned failover**
+  may lose the last unreplicated bytes (bounded by the standby's lag at the moment the
+  active DC died).
+
+## Timing and tuning (RTO vs safety)
+
+DC-DR has one timing invariant that must hold for correctness:
+
+> **fence TTL + cross-DC clock skew < primary-DC Lease duration**
+
+The marker `renewTime` tracks the Lease's renewTime. A partitioned active DC
+self-fences at `lastRenew + fence TTL`; a survivor can only acquire the expired Lease
+at `lastRenew + LeaseDuration`. Keeping the fence TTL inside the Lease duration
+guarantees the old active DC goes read-only **before** any new DC becomes writable ,
+no split-brain window.
+
+Default values:
+
+| Parameter | Where | Default |
+| --- | --- | --- |
+| Fence TTL | documentdb-coordinator | 30s |
+| Marker refresh interval | dr-controlplane agent | 5s |
+| Primary-DC Lease duration | dr-controlplane agent (`--election-lease-duration`) | 45s |
+| Lease renew deadline | dr-controlplane agent (`--election-renew-deadline`) | 30s |
+| Lease retry period | dr-controlplane agent (`--election-retry-period`) | 2s |
+
+The failover **RTO floor** is roughly the Lease duration (the time a survivor waits to
+acquire). To lower RTO, lower the Lease duration **and** the fence TTL together,
+always preserving `fence TTL + skew < LeaseDuration`. The retry period must stay well
+under the fence TTL so the holder restamps `renewTime` and the marker reads fresh in
+normal operation.
+
+## Quorum, roles, and arbiters
+
+- Each DC's raft needs its own quorum. A DC with an **even** local node count gets its
+  own in-DC arbiter (`<db>-<dc>-arbiter`) so intra-DC failover keeps quorum; an odd
+  count needs none.
+- The witness DC (`role: Arbiter`) holds only the `dr-controlplane` vote, never DocumentDB data.
+- Scaling a DC re-evaluates its parity automatically: the arbiter is created or removed
+  (and de-registered from the DC raft) as the local count crosses even/odd.
+
+Separately from a DC's *intra-DC* quorum, the **cross-DC** failover quorum needs a
+majority of three voting sites. For how to lay this out across two or three data
+centers (and why a third witness site is preferred), see
+[Deployment topologies](/docs/guides/documentdb/dr/overview/index.md#deployment-topologies-2-dcs-vs-3-dcs).
+
+## Planned switchover (zero-RPO)
+
+Move the active DC on purpose by annotating the DocumentDB:
+
+```bash
+$ kubectl annotate documentdb -n demo docdb-dcdr dr.kubedb.com/switchover-to=dc-west
+```
+
+The hub then:
+
+1. checks the target is a known, healthy DC within the lag budget
+   (`dr.kubedb.com/switchover-max-lag-bytes`, default 16 MiB);
+2. sets `phase: FailingOver` and asks the active DC to **quiesce** (hold its primary
+   read-only) via the primary-DC Lease, freezing the active write position;
+3. waits until the target has replayed to within one WAL page of that frozen position;
+4. hands the Lease to the target, which is promoted; the old DC resumes as a standby.
+
+The annotation is cleared automatically once the target is active. Watch
+`status.disasterRecovery` for `phase` returning to `Steady` with the new `activeDC`.
+
+## Failback
+
+Failback is just a switchover back to the original DC once it is healthy again:
+
+```bash
+$ kubectl annotate documentdb -n demo docdb-dcdr dr.kubedb.com/switchover-to=dc-east
+```
+
+A DC that lost the Lease and rejoins automatically rewinds any divergent WAL tail
+(`pg_rewind`, with a base-backup reseed fallback) and resumes streaming from the
+current active primary before it is eligible.
+
+## Per-DC horizontal scaling
+
+Each DC has its own raft, so scale a specific DC with a `DocumentDBOpsRequest`:
+
+```yaml
+apiVersion: ops.kubedb.com/v1alpha1
+kind: DocumentDBOpsRequest
+metadata:
+  name: docdb-dcdr-scale-west
+  namespace: demo
+spec:
+  type: HorizontalScaling
+  databaseRef:
+    name: docdb-dcdr
+  horizontalScaling:
+    dataCenters:
+    - clusterName: dc-west
+      replicas: 5
+```
+
+- Each entry sets that DC's local node count; DCs not listed are unchanged.
+- Nodes are added or removed one at a time over the DC-local network; the DC's arbiter
+  is created/removed as parity changes; on scale-down the removed node's replication
+  slot is dropped.
+- The base `PlacementPolicy` is renumbered so the declarative topology matches.
+- Scaling a Member DC to `1` makes it a single-node DC (no in-DC HA, still part of
+  cross-DC DR). Scaling to `0` is rejected, removing a whole DC is a topology change,
+  not horizontal scaling.
+
+## Day-2 operations
+
+The standard `DocumentDBOpsRequest` operations apply to every per-DC group on a DC-DR
+cluster; issue them exactly as for a non-distributed DocumentDB:
+
+| Operation | DC-DR behavior |
+| --- | --- |
+| **Vertical scaling** | Patches every per-DC PetSet and per-DC arbiter, restarts per-DC pods. |
+| **Volume expansion** (online/offline) | Expands every per-DC data PVC and per-DC arbiter PVC, waits on all per-DC PetSets. |
+| **Version update** | Updates every per-DC PetSet. |
+| **Storage migration** | Orphan-deletes and waits on every per-DC PetSet. |
+| **Reconfigure / Restart / Rotate-Auth** | Apply across the per-DC pods. |
+
+The DC-DR control verbs `ForceFailOver`, `ReconnectStandby`, and `SetRaftKeyPair` are
+driven by the hub (or issued as a `DocumentDBOpsRequest`) to promote a survivor,
+re-point a standby's cross-DC stream, and rotate the raft key material respectively.
+
+## Backup
+
+Back up a DC-DR DocumentDB the same way as any KubeDB DocumentDB (KubeStash / the
+DocumentDB archiver). Logical and base backups run against the writable endpoint, so
+they read from the active DC; the AppBinding follows failover, so a scheduled backup
+continues against the new active DC after a failover. Point-in-time recovery works as
+usual.
+
+## Deletion and cleanup
+
+```bash
+$ kubectl delete documentdb -n demo docdb-dcdr
+```
+
+Per `deletionPolicy`, the operator removes the per-DC PetSets, governing Services, and
+the cluster-scoped per-DC `PlacementPolicies` it generated (these carry no owner
+reference, so the operator deletes them explicitly). The user-provided base
+`PlacementPolicy` is left for you to delete.
+
+## Limitations
+
+- **Adding or removing a whole data center** is a topology change (a new group, raft,
+  and cross-DC seed), distinct from horizontal scaling, and is performed by editing the
+  `PlacementPolicy` topology, not a `HorizontalScaling` request.
+- Cross-DC replication is asynchronous; an unplanned failover has a non-zero RPO
+  bounded by the standby lag. Use a **planned switchover** for zero-RPO moves.
+- All correctness depends on the timing invariant above; do not set a fence TTL that
+  meets or exceeds the Lease duration.
diff --git a/docs/guides/documentdb/dr/overview/index.md b/docs/guides/documentdb/dr/overview/index.md
new file mode 100644
index 000000000..d39bdea68
--- /dev/null
+++ b/docs/guides/documentdb/dr/overview/index.md
@@ -0,0 +1,384 @@
+---
+title: DC-DR Overview
+menu:
+  docs_{{ .version }}:
+    identifier: guides-documentdb-dr-overview
+    name: Overview
+    parent: guides-documentdb-dr
+    weight: 10
+menu_name: docs_{{ .version }}
+section_menu_id: guides
+---
+
+# Cross Data Center Disaster Recovery (DC-DR) for DocumentDB
+
+KubeDB can run a single distributed `DocumentDB` across multiple data centers so the
+database survives the loss of an entire data center (DC). Exactly one DC is writable
+at any instant; the others are warm, read-only standbys that stream from it across
+the DCs. When the active DC is lost, KubeDB promotes a surviving DC, and the single
+connection endpoint follows the new writable DC.
+
+KubeDB `DocumentDB` is Microsoft DocumentDB (the `pg_documentdb` extension) running on
+PostgreSQL under the hood, so DC-DR reuses the proven PostgreSQL machinery: WAL
+streaming replication between data centers, the per-DC `documentdb-coordinator` raft,
+and `pg_rewind` for failback. This guide builds on the same distributed substrate
+(one CR, Open Cluster Management, KubeSlice, and a `PlacementPolicy`) and adds the
+cross-DC failover machinery on top.
+
+This page is the conceptual overview and a quick start. See also:
+
+- [DC-DR User Guide](/docs/guides/documentdb/dr/guide/index.md), every
+  aspect of running in DC-DR mode (components, monitoring, timing, scaling, day-2 ops).
+- [DC-DR Runbook](/docs/guides/documentdb/dr/runbook/index.md), what to
+  do in each operational scenario.
+
+> **New to KubeDB?** Please start [here](/docs/README.md).
+
+## How it works
+
+DC-DR is built on one rule: **the DocumentDB raft never stretches across data centers.**
+
+- **Each data center is a self-contained DocumentDB group.** The operator expands the
+  single `DocumentDB` CR into one group per data-bearing DC, each with its own
+  `documentdb-coordinator` raft electing a **local** leader, its own local replicas,
+  and (when its local replica count is even) its own local arbiter. The raft quorum
+  never crosses the DC boundary, so cross-DC latency or a partition can never flap an
+  election.
+- **One cross-DC authority decides who is writable.** A small control plane
+  (`dr-controlplane`), backed by a three-site etcd quorum, publishes one
+  `coordination.k8s.io` **Lease** per failover scope. The DC that holds the Lease is
+  the **active** (writable) DC. This is the single cross-DC failover decision.
+- **Cross-DC replication is leader-to-leader WAL streaming.** The standby DC's local
+  leader runs as an asynchronous streaming standby of the active DC's leader; that
+  standby DC's own replicas cascade from its local leader. So a standby DC opens
+  exactly one cross-DC replication link. Whether standbys stay Hot or Warm and whether
+  streaming is Synchronous or Asynchronous follow the CR's `spec.standbyMode`
+  (`Hot`/`Warm`) and `spec.streamingMode` (`Synchronous`/`Asynchronous`); cross-DC
+  links are asynchronous by design.
+- **Writability is fenced locally and fails closed.** A per-DC `dr-controlplane`
+  agent projects the Lease holder onto its own spoke cluster as a small marker
+  `ConfigMap`. The `documentdb-coordinator` reads only that local marker: if it cannot
+  confirm its DC holds the Lease (the DC lost it, or is partitioned from the
+  coordination plane), it demotes its leader to read-only. Because the fence lives in
+  the DC and fails closed, a cut-off old-active DC stops accepting writes on its own,
+  before the hub even reacts. This local fence plus the etcd majority (only one DC can
+  hold the Lease) is the split-brain guarantee.
+- **Only the active DC's leader is labeled `primary`.** Each DC's coordinator leads
+  its own raft, but a non-active DC's leader is labeled `kubedb.com/role: standby`, so
+  the single primary `Service` and the `AppBinding` always resolve to the active DC's
+  writable leader.
+
+### Data center roles
+
+Each DC plays one role, set on the `PlacementPolicy` `distributionRule.role`:
+
+| Role | Holds DocumentDB data | Primary eligible | Purpose |
+| --- | --- | --- | --- |
+| **Member** | yes | yes | A full DocumentDB group; a candidate for the active DC. |
+| **Arbiter** | no | no | Vote only, the `dr-controlplane` etcd tie-breaker; runs no DocumentDB. **This is the role a DocumentDB witness DC uses.** |
+| **Witness** | yes | no | Data-bearing but never primary, for engines whose witness must carry data (e.g. MongoDB). **Not used by DocumentDB.** |
+
+> For DocumentDB the third "witness" data center is **vote-only** (it holds only the
+> `dr-controlplane` etcd member, no DocumentDB), so it is declared with `role: Arbiter`
+> and empty `replicaIndices`. The petset `Witness` role is reserved for engines whose
+> witness must carry data; DocumentDB does not use it.
+
+A typical layout is two Member DCs plus one vote-only witness DC (`role: Arbiter`):
+the three-site etcd quorum lives across all three, but DocumentDB data lives only in
+the two Member DCs.
+
+## Deployment topologies (2 DCs vs 3 DCs)
+
+The DR feature needs two things, in different quantities:
+
+- **DocumentDB data** lives in the **Member** data centers. You need at least **two**
+  Member DCs for cross-DC redundancy (one active, one warm standby).
+- **The failover decision** is made by the `dr-controlplane` etcd **quorum**. A quorum
+  makes progress only while a **majority of its three voting sites** is reachable. For
+  single-fault tolerance *and* split-brain safety, those three votes should sit in
+  **three independent failure domains**. The third domain can be a tiny vote-only
+  **witness** (`role: Arbiter`) that holds no DocumentDB data.
+
+So "how many data centers" has two answers: how many hold **data** (two or three), and
+how many hold a **quorum vote** (always three for automatic, split-brain-free
+failover). The `failoverPolicy.mode` selects the data layout:
+
+### A. Two Member DCs + a witness, `mode: TwoDC` (recommended)
+
+Three sites; two hold DocumentDB data, the third is a vote-only witness DC
+(`role: Arbiter`, no DocumentDB):
+
+```yaml
+failoverPolicy:
+  mode: TwoDC
+distributionRules:
+- { clusterName: dc-east, role: Member, replicaIndices: [0, 1, 2] }
+- { clusterName: dc-west, role: Member, replicaIndices: [3, 4, 5] }
+- { clusterName: dc-witness, role: Arbiter }    # etcd vote only, no DocumentDB
+```
+
+Any single site can be lost:
+
+- **Lose a Member DC** → the surviving Member plus the witness form a 2/3 majority, so
+  the survivor acquires the Lease and is promoted automatically; the lost DC, if alive
+  but partitioned, self-fences read-only.
+- **Lose the witness** → the two Members are still a 2/3 majority, so writes continue
+  uninterrupted.
+
+Because the witness runs no DocumentDB, it is small and cheap. **Run it in a third
+public cloud or region**, this is the lowest-cost way to get correct, automatic
+failover, and it is the recommended topology whenever a third location is available.
+
+### B. Three Member DCs, `mode: ThreeDC`
+
+Three sites, all data-bearing and primary-eligible:
+
+```yaml
+failoverPolicy:
+  mode: ThreeDC
+distributionRules:
+- { clusterName: dc-east,  role: Member, replicaIndices: [0, 1, 2] }
+- { clusterName: dc-west,  role: Member, replicaIndices: [3, 4, 5] }
+- { clusterName: dc-south, role: Member, replicaIndices: [6, 7, 8] }
+```
+
+More data copies and read capacity, and any DC can become primary. Tolerates the loss
+of any single Member DC. The cost is three full DocumentDB groups instead of two, use
+it when you want a data copy and primary capability in all three locations.
+
+### C. Two sites only, reduced resiliency
+
+If you genuinely have only two locations, you still need a third quorum vote, so you
+**place it inside one of the two DCs** (run the third `dr-controlplane` etcd member
+there). There is no separate witness site, so that DC now holds **two of the three
+votes**:
+
+- **Lose the other DC** (the one with one vote) → the two-vote DC keeps the majority →
+  failover/continuity works automatically.
+- **Lose the two-vote DC** → the survivor holds only one of three votes, cannot form a
+  quorum, and therefore cannot safely become writable on its own. **Automatic failover
+  does not happen**; recovery is a manual, operator-confirmed step, and you must be
+  certain the failed DC is truly down to avoid split-brain.
+
+This protects against losing one specific DC, not both symmetrically. Prefer adding a
+cheap third witness site (topology A) whenever possible.
+
+### At a glance
+
+| Topology | Sites | Data DCs | Tolerates | Automatic failover |
+| --- | --- | --- | --- | --- |
+| Two Member + Arbiter witness (`TwoDC`) | 3 | 2 | any 1 site | yes |
+| Three Member (`ThreeDC`) | 3 | 3 | any 1 site | yes |
+| Two sites, co-located quorum | 2 | 2 | only the one-vote DC | only when the one-vote DC is lost |
+
+## Prerequisites
+
+- A working distributed DocumentDB substrate: Open Cluster Management (OCM) hub and
+  spoke clusters, KubeSlice connecting the spokes, and a storage class on each spoke.
+  DocumentDB reuses the same substrate as
+  [Distributed Postgres](/docs/guides/postgres/distributed/overview/index.md), since it
+  runs on PostgreSQL under the hood.
+- The `dr-controlplane` service and its three-site etcd quorum installed across the
+  data centers, with a `dr-controlplane` agent running in each spoke (DC).
+- The KubeDB DocumentDB operator started with the DC-DR flags:
+
+  ```
+  --dc-dr-enabled
+  --dc-dr-coord-kubeconfig=<path to the coordination control plane kubeconfig>
+  --dc-dr-local-dc=<this operator's data center name>
+  ```
+
+- One consistent **DC name** per data center, used everywhere: the OCM spoke cluster
+  name, the agent `--dc-name`, the Lease `holderIdentity`, the marker `activeDC`, the
+  pod label `open-cluster-management.io/cluster-name`, and the `PlacementPolicy`
+  `distributionRule.clusterName`. Keep them identical.
+
+## Deploy a DC-DR DocumentDB
+
+A DC-DR DocumentDB is a distributed `DocumentDB` whose `PlacementPolicy` carries a
+`failoverPolicy` and per-DC roles. The user creates and edits a **single** `DocumentDB`
+object and gets one `AppBinding` and one connection endpoint; the operator expands it
+into the per-DC groups.
+
+### 1. PlacementPolicy
+
+Assign the global pod ordinals to data centers and tag each DC with its role. Here two
+Member DCs (`dc-east`, `dc-west`) each get three DocumentDB pods, and `dc-arbiter` is
+the tie-breaking witness:
+
+```yaml
+apiVersion: apps.k8s.appscode.com/v1
+kind: PlacementPolicy
+metadata:
+  name: docdb-dcdr
+spec:
+  clusterSpreadConstraint:
+    slice:
+      projectNamespace: kubeslice-demo
+      sliceName: demo-slice
+    failoverPolicy:
+      trigger:
+        scope: Global
+      mode: TwoDC
+    distributionRules:
+    - clusterName: dc-east
+      role: Member
+      replicaIndices: [0, 1, 2]
+    - clusterName: dc-west
+      role: Member
+      replicaIndices: [3, 4, 5]
+    - clusterName: dc-arbiter
+      role: Arbiter
+```
+
+- A data-bearing **Member** rule carries `replicaIndices`; the **Arbiter** witness DC
+  (vote only, no DocumentDB) carries none.
+- `failoverPolicy.trigger.scope: Global` makes this one cluster-wide failover scope.
+  Use `Group` with a group name to put a database in its own scope.
+
+### 2. DocumentDB
+
+Reference the `PlacementPolicy` and opt the DocumentDB into DC-DR expansion:
+
+```yaml
+apiVersion: kubedb.com/v1alpha2
+kind: DocumentDB
+metadata:
+  name: docdb-dcdr
+  namespace: demo
+  annotations:
+    # Opt this distributed DocumentDB into per-DC DC-DR expansion.
+    dr.kubedb.com/enabled: "true"
+spec:
+  version: "pg17-0.109.0"
+  replicas: 6
+  distributed: true
+  storageType: Durable
+  podTemplate:
+    spec:
+      podPlacementPolicy:
+        name: docdb-dcdr
+  storage:
+    accessModes: [ReadWriteOnce]
+    resources:
+      requests:
+        storage: 1Gi
+  deletionPolicy: WipeOut
+```
+
+The operator then creates, per data-bearing DC:
+
+- a per-DC `PetSet` named `<db>-<dc>` (for example `docdb-dcdr-dc-east`) with its own
+  intra-DC raft and DC-local governing `Service`;
+- a per-DC arbiter `PetSet` `<db>-<dc>-arbiter` when that DC's local node count is
+  even.
+
+The witness DC (`role: Arbiter`) runs no DocumentDB pods.
+
+## Observe the DC-DR state
+
+The single `DocumentDB` object's `status.disasterRecovery` carries the whole cross-DC
+view:
+
+```bash
+$ kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{.status.disasterRecovery}' | jq
+```
+
+```json
+{
+  "activeDC": "dc-east",
+  "phase": "Steady",
+  "lastTransitionTime": "2026-06-30T10:00:00Z",
+  "dataCenters": [
+    { "clusterName": "dc-east", "role": "primary", "leader": "docdb-dcdr-dc-east-0", "writable": true,  "healthy": true },
+    { "clusterName": "dc-west", "role": "standby", "leader": "docdb-dcdr-dc-west-0", "writable": false, "healthy": true, "lagBytes": 4096 }
+  ]
+}
+```
+
+- `activeDC` is the DC that currently holds the Lease and runs the writable primary.
+- `phase` is `Steady`, `FailingOver`, `FailingBack`, or `Degraded`.
+- Each `dataCenters` entry reports that DC's local leader, whether it is the writable
+  primary, its health, and its cross-DC replication `lagBytes` (the in-DC coordinator
+  computes this and surfaces it; the hub never opens cross-cluster SQL).
+
+## Unplanned failover
+
+When the active DC is lost, its agents stop renewing the primary-DC Lease. After the
+Lease duration a surviving Member DC's agent acquires it; that DC becomes `activeDC`.
+The hub observes the change and drives a bounded-loss promotion of the survivor
+through a `ForceFailOver` `DocumentDBOpsRequest`, while the old DC self-fences
+read-only locally (before the hub reacts, even under a partition). The primary
+`Service` and `AppBinding` then resolve to the new writable DC.
+
+You do not trigger this; it is automatic. `status.disasterRecovery.phase` moves to
+`FailingOver` during the transition and back to `Steady` once the survivor is primary.
+
+## Planned switchover (zero-RPO)
+
+To move the active DC on purpose (maintenance, rebalancing) without losing committed
+rows, annotate the DocumentDB with the target DC:
+
+```bash
+$ kubectl annotate documentdb -n demo docdb-dcdr dr.kubedb.com/switchover-to=dc-west
+```
+
+The switchover is coordinated for zero RPO:
+
+1. The target must be a known, healthy DC within the lag budget.
+2. The hub asks the active DC to **quiesce** (hold its primary read-only) via the
+   primary-DC Lease, so the active primary's write position freezes.
+3. The hub waits until the target has replayed to within one WAL page of the frozen
+   position.
+4. The Lease hands off to the target; it is promoted and the active DC resumes (now as
+   a standby). The annotation is cleared automatically.
+
+Because writes are frozen and the target fully catches up before the handoff, a
+planned switchover loses no committed rows.
+
+## Scale a data center
+
+Each DC has its own intra-DC raft, so a single `spec.replicas` cannot describe a
+scale. Scale a specific DC with a `DocumentDBOpsRequest` that lists per-DC targets:
+
+```yaml
+apiVersion: ops.kubedb.com/v1alpha1
+kind: DocumentDBOpsRequest
+metadata:
+  name: docdb-dcdr-scale
+  namespace: demo
+spec:
+  type: HorizontalScaling
+  databaseRef:
+    name: docdb-dcdr
+  horizontalScaling:
+    dataCenters:
+    - clusterName: dc-west
+      replicas: 5
+```
+
+Each entry sets that data center's local node count; DCs not listed are unchanged.
+The request resizes only `dc-west`'s raft (adding or removing nodes one at a time over
+the DC-local network, managing that DC's arbiter parity), then updates the
+`PlacementPolicy` so the declarative topology matches. No other DC's raft and no
+cross-DC writability is touched. Scaling a DC to `1` makes it a single-node DC (no
+in-DC HA, but still part of cross-DC DR); removing a whole DC is a topology change, not
+a scale.
+
+## Day-2 operations
+
+The standard DocumentDB `DocumentDBOpsRequest` operations work on a DC-DR cluster and
+act on every per-DC group: vertical scaling, volume expansion (online and offline),
+version update, and storage migration each apply to all per-DC `PetSet`s and per-DC
+arbiters. You issue them exactly as for a non-distributed DocumentDB.
+
+## Cleanup
+
+```bash
+$ kubectl delete documentdb -n demo docdb-dcdr
+$ kubectl delete placementpolicy docdb-dcdr
+```
+
+Deleting the `DocumentDB` removes the per-DC `PetSet`s, governing `Service`s, and the
+cluster-scoped per-DC `PlacementPolicies` the operator generated. The user-provided
+base `PlacementPolicy` is left for you to delete.
diff --git a/docs/guides/documentdb/dr/runbook/index.md b/docs/guides/documentdb/dr/runbook/index.md
new file mode 100644
index 000000000..24ec17620
--- /dev/null
+++ b/docs/guides/documentdb/dr/runbook/index.md
@@ -0,0 +1,356 @@
+---
+title: DC-DR Runbook
+menu:
+  docs_{{ .version }}:
+    identifier: guides-documentdb-dr-runbook
+    name: Runbook
+    parent: guides-documentdb-dr
+    weight: 30
+menu_name: docs_{{ .version }}
+section_menu_id: guides
+---
+
+# DocumentDB DC-DR Runbook
+
+Scenario-by-scenario procedures for operating a DocumentDB cluster in cross data
+center disaster recovery (DC-DR) mode. Each scenario lists the **symptoms**, what
+KubeDB does **automatically**, how to **verify**, and the **action** to take.
+
+Read the [User Guide](/docs/guides/documentdb/dr/guide/index.md) for the
+concepts and commands referenced here. Throughout, `<coord>` is the coordination
+control plane kubeconfig, `docdb-dcdr`/`demo` are the example database and namespace.
+
+## Quick reference
+
+```bash
+# Active DC, phase, and per-DC view:
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{.status.disasterRecovery}' | jq
+
+# Lease holder (the source of truth for "who is active"):
+kubectl --kubeconfig <coord> -n dc-failover get lease primary-dc -o jsonpath='{.spec.holderIdentity}'
+
+# Per-DC leaders, roles, and DCs:
+kubectl get pods -n demo -l app.kubernetes.io/instance=docdb-dcdr -L kubedb.com/role,open-cluster-management.io/cluster-name
+
+# A spoke's marker (what its coordinators read):
+kubectl -n dc-failover get configmap primary-dc -o jsonpath='{.data}'
+```
+
+Golden rules:
+
+- **The Lease decides who is writable.** Never make a pod writable by hand.
+- **The fence fails closed.** A DC that cannot confirm it holds the Lease is read-only
+  by design, that is correct, not a bug.
+- **Exactly one DC is `writable: true`** in `status.disasterRecovery` at any instant.
+
+---
+
+## 1. Active DC lost (zone/cluster failure)
+
+**Symptoms:** the active DC's pods are gone/unreachable; writes fail briefly.
+
+**Automatic:** the lost DC's agents stop renewing the Lease. After the Lease duration
+(~45s) a surviving Member DC's agent acquires it and becomes `activeDC`. The hub drives
+a bounded-loss promotion (`ForceFailOver` `DocumentDBOpsRequest`) of the survivor; the
+old DC, if partially alive, self-fences read-only. The primary `Service` and
+`AppBinding` follow to the new DC. `phase` moves `FailingOver` → `Steady`.
+
+**Verify:**
+
+```bash
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{.status.disasterRecovery.activeDC}'   # the survivor
+kubectl get documentdbopsrequest -n demo -l app.kubernetes.io/managed-by=kubedb-dcdr  # the failover ops
+```
+
+**Action:** none required for availability. Note the RPO: writes not yet replicated
+when the DC died are lost. When the failed DC returns, see scenario 11 (re-add a DC).
+
+---
+
+## 2. Network partition between data centers
+
+**Symptoms:** DCs are up but cannot reach each other or the coordination plane.
+
+**Automatic:** the side that loses the coordination plane stops getting Lease updates;
+its marker `renewTime` freezes and, after the 30s fence TTL, its coordinator demotes
+its leader to read-only, **before** the Lease duration lets the other side acquire
+(this is the timing invariant). The side that keeps the etcd majority holds/acquires
+the Lease and stays (or becomes) writable. There is no split-brain.
+
+**Verify there is exactly one writable DC:**
+
+```bash
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{range .status.disasterRecovery.dataCenters[*]}{.clusterName}={.writable} {end}'
+```
+
+**Action:** heal the network. The fenced side rejoins, rewinds any divergent tail, and
+resumes streaming automatically. If both sides show `writable: false`, see scenario 6
+(coordination plane down).
+
+---
+
+## 3. Planned switchover (maintenance on the active DC)
+
+**Action:**
+
+```bash
+kubectl annotate documentdb -n demo docdb-dcdr dr.kubedb.com/switchover-to=dc-west
+```
+
+**Automatic:** the hub gates on the target's health and lag, quiesces the active DC
+(holds its primary read-only via the Lease), waits until the target catches up to
+within one WAL page, then hands off. Zero committed rows are lost. The annotation is
+cleared on completion.
+
+**Verify:**
+
+```bash
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{.status.disasterRecovery.activeDC}'  # dc-west
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{.status.disasterRecovery.phase}'     # Steady
+```
+
+**If it does not complete:** see scenario 8 (switchover stuck).
+
+---
+
+## 4. Planned failback to the original DC
+
+After the original DC is healthy and caught up:
+
+```bash
+kubectl annotate documentdb -n demo docdb-dcdr dr.kubedb.com/switchover-to=dc-east
+```
+
+Same zero-RPO flow as scenario 3. A DC that previously lost the Lease rewinds its
+divergent tail (`pg_rewind`, base-backup reseed fallback) before it is eligible, so
+failback is safe even after an unplanned failover.
+
+---
+
+## 5. A standby DC is lost
+
+**Symptoms:** a non-active DC's pods are gone; that DC shows `healthy: false`.
+
+**Impact:** none on writes, the active DC is unaffected. You lose that DC's
+redundancy and its standby read capacity until it returns.
+
+**Verify the active DC is still writable:**
+
+```bash
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{.status.disasterRecovery.dataCenters[?(@.writable==true)].name}'
+```
+
+**Action:** recover the DC's nodes; the per-DC group reschedules and re-seeds from the
+active primary automatically.
+
+---
+
+## 6. Coordination plane (dr-controlplane / etcd) unavailable
+
+**Symptoms:** the Lease cannot be read/renewed; markers go stale across all spokes;
+every DC eventually fences read-only.
+
+**Automatic:** this is fail-closed, with no trustworthy Lease, **no** DC is allowed to
+be writable. The database is read-only globally rather than risk split-brain.
+
+**Verify:**
+
+```bash
+kubectl --kubeconfig <coord> -n dc-failover get lease primary-dc        # error / stale renewTime
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{range .status.disasterRecovery.dataCenters[*]}{.clusterName}={.writable} {end}'  # all false
+```
+
+**Action:** restore the `dr-controlplane` etcd quorum. Once the Lease is renewable, the
+holder's marker refreshes, its coordinator un-fences, and writes resume. Do **not**
+force a pod writable to work around this.
+
+---
+
+## 7. Failover not promoting a survivor
+
+**Symptoms:** the active DC is gone but `activeDC` does not move, or no writable DC
+appears.
+
+**Diagnose:**
+
+```bash
+# Did the Lease move?
+kubectl --kubeconfig <coord> -n dc-failover get lease primary-dc -o jsonpath='{.spec.holderIdentity}'
+# Are there candidate pods in the survivor DC?
+kubectl get pods -n demo -l app.kubernetes.io/instance=docdb-dcdr -L kubedb.com/role,open-cluster-management.io/cluster-name
+# Did the hub create a failover ops request, and what is its phase?
+kubectl get documentdbopsrequest -n demo -l app.kubernetes.io/managed-by=kubedb-dcdr -o wide
+```
+
+**Common causes & action:**
+
+- **Lease did not move**, only Member DCs are eligible; confirm the survivor is a
+  `Member` in the `PlacementPolicy` and its agent can reach the coordination plane.
+- **No candidates**, the survivor DC has no ready data pods; recover its pods.
+- **Ops request failed**, inspect its conditions; the hub does not create a duplicate
+  while one is open, so resolve or delete the stuck request and let reconcile retry.
+
+---
+
+## 8. Planned switchover stuck (target not catching up)
+
+**Symptoms:** after annotating `switchover-to`, `phase` stays `FailingOver` and the
+Lease does not hand off.
+
+**Diagnose:**
+
+```bash
+# Target lag and health:
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{range .status.disasterRecovery.dataCenters[*]}{.clusterName} lag={.lagBytes} healthy={.healthy}{"\n"}{end}'
+```
+
+**Causes & action:**
+
+- **Target lag not converging**, the active DC must be quiesced for the target to
+  reach the frozen LSN. Confirm the active DC's coordinator honored the quiesce (its
+  primary should be read-only); check the marker's `data.quiesce` names the active DC.
+- **Target unhealthy / no lag report**, the switchover refuses a target with no
+  `lagBytes` yet; ensure the target DC's leader is up and publishing lag.
+- **Target legitimately too far behind**, raise the budget only if you accept the
+  catch-up time: `kubectl annotate documentdb -n demo docdb-dcdr dr.kubedb.com/switchover-max-lag-bytes=<bytes>`.
+- **Abort**, remove the annotation to cancel:
+  `kubectl annotate documentdb -n demo docdb-dcdr dr.kubedb.com/switchover-to-`.
+
+---
+
+## 9. Lag growing on a standby DC
+
+**Symptoms:** a DC's `lagBytes` climbs steadily.
+
+**Diagnose:** cross-DC network throughput/latency, write volume on the active primary,
+and replication health on the standby DC's leader.
+
+```bash
+kubectl get pod -n demo <standby-dc-leader> -o jsonpath='{.metadata.annotations.kubedb\.com/dc-lag-bytes}'
+# On the active primary, check the cross-DC replica:
+#   SELECT client_addr, state, sent_lsn, replay_lsn FROM pg_stat_replication;
+```
+
+**Action:** relieve the bottleneck (network, primary load). High lag widens the RPO of
+an unplanned failover and can block a planned switchover until it drains.
+
+---
+
+## 10. A DC is unexpectedly read-only (fence tripped)
+
+**Symptoms:** a DC you expect to be active is read-only; its leader is labeled
+`standby`.
+
+**Diagnose the fence chain:**
+
+```bash
+# Does this spoke's marker name this DC and is renewTime fresh?
+kubectl -n dc-failover get configmap primary-dc -o jsonpath='{.data}'
+# Is the dr-controlplane agent running in this DC and renewing?
+kubectl get pods -n <agent-namespace> -l app=dr-controlplane-agent
+```
+
+**Causes & action:**
+
+- **Marker stale** (`renewTime` old), the agent cannot reach the coordination plane,
+  or the projector is failing; restore agent connectivity.
+- **Marker names another DC**, this DC simply is not the active one (correct).
+- **Clock skew**, large cross-DC clock skew can trip the TTL early; verify NTP. The
+  timing budget assumes skew is well under (LeaseDuration − fence TTL).
+
+Never patch `kubedb.com/role` by hand to force writability, the next reconcile and the
+fence will revert it, and you risk split-brain.
+
+---
+
+## 11. Re-add / recover a previously lost data center
+
+After a DC returns from a failure:
+
+**Automatic:** its per-DC group reschedules, and each pod seeds from the active DC
+primary (a node that was previously active rewinds its divergent tail first). Once
+caught up, the DC's leader becomes a healthy read-only standby and its `lagBytes`
+appears in status.
+
+**Verify:**
+
+```bash
+kubectl get documentdb -n demo docdb-dcdr -o jsonpath='{range .status.disasterRecovery.dataCenters[*]}{.clusterName} healthy={.healthy} lag={.lagBytes}{"\n"}{end}'
+```
+
+**Action:** to make it active again, perform a planned failback (scenario 4) once its
+lag is small.
+
+---
+
+## 12. Scale a data center up or down
+
+```bash
+# Scale dc-west to 5 nodes:
+kubectl apply -f - <<'YAML'
+apiVersion: ops.kubedb.com/v1alpha1
+kind: DocumentDBOpsRequest
+metadata: { name: docdb-dcdr-scale-west, namespace: demo }
+spec:
+  type: HorizontalScaling
+  databaseRef: { name: docdb-dcdr }
+  horizontalScaling:
+    dataCenters:
+    - { clusterName: dc-west, replicas: 5 }
+YAML
+```
+
+**Verify:** the per-DC PetSet reaches the new size, the arbiter appears/disappears with
+parity, and only that DC changed.
+
+```bash
+kubectl get petset -n demo -l app.kubernetes.io/instance=docdb-dcdr
+kubectl get documentdbopsrequest -n demo docdb-dcdr-scale-west -o jsonpath='{.status.phase}'
+```
+
+**Notes:** scaling to `1` is allowed (single-node DC, no in-DC HA); scaling to `0` is
+rejected, removing a DC is a topology change.
+
+---
+
+## 13. Version upgrade in DC-DR
+
+Issue a normal `UpdateVersion` `DocumentDBOpsRequest`; the operator updates every
+per-DC PetSet. Plan it during a low-traffic window and confirm each DC returns healthy
+in `status.disasterRecovery` before relying on failover again.
+
+---
+
+## 14. Suspected split-brain (two writable DCs)
+
+This should be impossible by design (etcd majority + fail-closed fence + the timing
+invariant). If `status.disasterRecovery` ever shows two `writable: true` DCs, or two
+pods labeled `kubedb.com/role: primary`:
+
+**Diagnose immediately:**
+
+```bash
+kubectl get pods -n demo -l app.kubernetes.io/instance=docdb-dcdr -L kubedb.com/role,open-cluster-management.io/cluster-name
+kubectl --kubeconfig <coord> -n dc-failover get lease primary-dc -o jsonpath='{.spec.holderIdentity}'
+```
+
+**Action:** the Lease holder is the true active DC. The other DC's fence should trip
+within the TTL; if it does not, its marker is wrong (check its agent/projector) or the
+timing invariant is misconfigured (verify fence TTL < Lease duration). Stop writes to
+the non-Lease-holder DC at the application layer until the fence reasserts, then
+reconcile (the non-holder rewinds and rejoins as a standby).
+
+---
+
+## Escalation checklist
+
+When unsure, collect:
+
+```bash
+kubectl get documentdb -n demo docdb-dcdr -o yaml
+kubectl get documentdbopsrequest -n demo -l app.kubernetes.io/managed-by=kubedb-dcdr -o yaml
+kubectl --kubeconfig <coord> -n dc-failover get lease -o yaml
+kubectl -n dc-failover get configmap primary-dc -o yaml   # on each spoke
+kubectl get pods -n demo -l app.kubernetes.io/instance=docdb-dcdr -L kubedb.com/role,open-cluster-management.io/cluster-name -o wide
+kubectl logs -n demo <leader-pod> -c documentdb-coordinator
+```