Add Redis cross-DC disaster recovery (DC-DR) docs#943
Conversation
Document DC-DR for Redis/Valkey under docs/guides/redis/dr/: an overview (concepts, 2-DC vs 3-DC topologies, deploy, observe status.disasterRecovery, unplanned failover, planned near-zero-RPO switchover, failback, scaling, cleanup) and a scenario runbook. Native scope is Sentinel and Standalone; the cross-DC link is a plain async REPLICAOF, the writable DC is chosen by the dr-controlplane primary-DC Lease, and the rd-coordinator fence holds a non-active DC's master standby, fail closed. Signed-off-by: Tamal Saha <tamal@appscode.com>
📝 WalkthroughWalkthroughThis PR adds three new documentation pages under docs/guides/redis/dr: a guide index with menu configuration, an overview page describing the Redis Cross-DC Disaster Recovery (DC-DR) architecture and setup, and a runbook detailing operational procedures and failure-scenario responses. ChangesRedis DC-DR Documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
docs/guides/redis/dr/overview/index.md (2)
163-167: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a language specifier to the fenced code block.
The shell flags block is missing a language tag, so it won't render with syntax highlighting.
📝 Proposed fix
- bash
--dc-dr-enabled
--dc-dr-coord-kubeconfig=
--dc-dr-local-dc=<this operator's data center name></details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@docs/guides/redis/dr/overview/index.mdaround lines 163 - 167, Add a
language tag to the fenced shell-flags block in the docs snippet so it renders
with syntax highlighting; update the markdown code fence around the dc-dr flag
example to use a bash specifier, keeping the existing flag lines unchanged.</details> <!-- cr-comment:v1:7663c761b384e10a3aa3c0f5 --> --- `40-40`: _📐 Maintainability & Code Quality_ | _🔵 Trivial_ | _⚡ Quick win_ **Use descriptive link text for accessibility.** Replace the non-descriptive "here" link text with something that describes the destination, e.g., "the KubeDB quick start guide". <details> <summary>📝 Proposed fix</summary> ```diff - > **New to KubeDB?** Please start [here](/docs/README.md). + > **New to KubeDB?** Please start [the KubeDB quick start guide](/docs/README.md).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/redis/dr/overview/index.md` at line 40, The link text in the “New to KubeDB?” sentence is non-descriptive and should be updated for accessibility. In the Markdown content, replace the generic “here” anchor in the introductory sentence with descriptive text that identifies the destination, using the nearby “New to KubeDB?” phrasing and the README link target as guidance. Keep the sentence clear and self-explanatory without relying on the link destination alone.docs/guides/redis/dr/runbook/index.md (1)
166-166: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueSimplify wordy phrasing.
"a majority of its three sites" → "2 of 3 sites" or "a majority of the three etcd members".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/redis/dr/runbook/index.md` at line 166, The phrasing in the DR runbook is too wordy and should be simplified. Update the sentence containing “restore the dr-controlplane etcd quorum” to replace “a majority of its three sites” with a clearer equivalent such as “2 of 3 sites” or “a majority of the three etcd members.” Keep the wording concise and consistent with the surrounding runbook language.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/guides/redis/dr/runbook/index.md`:
- Line 45: The runbook’s golden rule in the redis DR guide is too strict and
conflicts with the quorum-loss scenario. Update the statement referenced by the
disasterRecovery section to say “at most one DC is writable: true” and
explicitly allow zero writable DCs when the coordination plane quorum is lost.
Keep the wording aligned with Scenario 6 so the guidance is consistent across
the document.
---
Nitpick comments:
In `@docs/guides/redis/dr/overview/index.md`:
- Around line 163-167: Add a language tag to the fenced shell-flags block in the
docs snippet so it renders with syntax highlighting; update the markdown code
fence around the dc-dr flag example to use a bash specifier, keeping the
existing flag lines unchanged.
- Line 40: The link text in the “New to KubeDB?” sentence is non-descriptive and
should be updated for accessibility. In the Markdown content, replace the
generic “here” anchor in the introductory sentence with descriptive text that
identifies the destination, using the nearby “New to KubeDB?” phrasing and the
README link target as guidance. Keep the sentence clear and self-explanatory
without relying on the link destination alone.
In `@docs/guides/redis/dr/runbook/index.md`:
- Line 166: The phrasing in the DR runbook is too wordy and should be
simplified. Update the sentence containing “restore the dr-controlplane etcd
quorum” to replace “a majority of its three sites” with a clearer equivalent
such as “2 of 3 sites” or “a majority of the three etcd members.” Keep the
wording concise and consistent with the surrounding runbook language.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 00edbb60-18e9-42bb-966d-6be36276ade1
📒 Files selected for processing (3)
docs/guides/redis/dr/_index.mddocs/guides/redis/dr/overview/index.mddocs/guides/redis/dr/runbook/index.md
| `SENTINEL FAILOVER` across DCs by hand. | ||
| - **The fence fails closed.** A DC that cannot confirm it holds the Lease keeps its master | ||
| labeled `standby` and read-only by design; that is correct, not a bug. | ||
| - **Exactly one DC is `writable: true`** in `status.disasterRecovery` at any instant. |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Contradiction: "Exactly one DC is writable: true" fails when coordination plane is down.
Scenario 6 (lines 152-168) correctly states that when the etcd quorum is lost, all DCs fail closed and no DC is writable. The golden rule should qualify this to avoid operator panic during a coordination-plane outage.
Suggested rephrase: "At most one DC is writable: true in status.disasterRecovery at any instant; zero is expected when the coordination plane quorum is lost."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/guides/redis/dr/runbook/index.md` at line 45, The runbook’s golden rule
in the redis DR guide is too strict and conflicts with the quorum-loss scenario.
Update the statement referenced by the disasterRecovery section to say “at most
one DC is writable: true” and explicitly allow zero writable DCs when the
coordination plane quorum is lost. Keep the wording aligned with Scenario 6 so
the guidance is consistent across the document.
What
Adds documentation for cross data center disaster recovery (DC-DR) of KubeDB Redis / Valkey, under
docs/guides/redis/dr/, mirroring the structure of the other engines' DC-DR docs.dr/overview/index.md— concepts (gossip/Sentinel quorum stays intra-DC, thedr-controlplaneLease is the single cross-DC authority, plain asyncREPLICAOFis the cross-DC link, the fail-closedrd-coordinatorfence), 2-DC vs 3-DC topologies, prerequisites, a deploy walkthrough (PlacementPolicy+Redis), observingstatus.disasterRecovery, unplanned failover, planned near-zero-RPO switchover, failback via resync, per-DC scaling, and cleanup.dr/runbook/index.md— scenario-by-scenario procedures (active DC lost, partition, planned switchover/failback, standby lost, coordination plane down, switchover stuck, stale marker, re-adding a DC, verifying the split-brain guarantee).Scope
Native DC-DR scope is Sentinel and Standalone (they have a native cross-cluster
REPLICAOFprimitive). Cluster mode has no native cross-cluster replication and is called out as deferred to a later external-sync phase.Related
Documents the feature implemented in kubedb/redis#658 (operator), kubedb/redis-coordinator#164 (fence), and kubedb/apimachinery#1793 (
status.disasterRecoverytypes). All commands, annotations (dr.kubedb.com/enabled,dr.kubedb.com/switchover-to,dr.kubedb.com/switchover-max-lag-bytes), env, status fields, and flags match that implementation.Summary by CodeRabbit