Add Cassandra DC-DR (cross data center disaster recovery) guide#946
Add Cassandra DC-DR (cross data center disaster recovery) guide#946tamalsaha wants to merge 3 commits into
Conversation
Signed-off-by: Tamal Saha <tamal@appscode.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughThis PR adds Cassandra Cross-DC Disaster Recovery documentation: a new navigation index, an overview and quick-start page, a detailed user guide, and an operational runbook, plus a link from the main Cassandra README. ChangesCassandra DC-DR Documentation
Estimated code review effort: 2 (Simple) | ~15 minutes Suggested labels: documentation 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (5)
docs/guides/cassandra/README.md (1)
63-65: 📐 Maintainability & Code Quality | 🔵 TrivialUse descriptive link text.
"Follow here" is non-descriptive link text. Replace with something like "See the Cassandra DC-DR overview" or "Follow the DC-DR overview guide."
- Follow [here](/docs/guides/cassandra/dr/overview/index.md). + Follow the [Cassandra DC-DR overview guide](/docs/guides/cassandra/dr/overview/index.md).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/cassandra/README.md` around lines 63 - 65, The link text in the Cassandra DC-DR section is too generic, so update the markdown in the README to use descriptive anchor text instead of “here.” Keep the same target URL, but change the sentence so the link text clearly identifies the destination, using the surrounding section title “Cross-DC Disaster Recovery (DC-DR)” as the reference point.Source: Linters/SAST tools
docs/guides/cassandra/dr/runbook/index.md (2)
200-201: 📐 Maintainability & Code Quality | 🔵 TrivialClarify the annotation deletion syntax.
The abort command uses
dr.kubedb.com/switchover-to-(trailing hyphen) to delete the annotation, which is correct kubectl syntax but may confuse readers. Add a comment explaining.- **Abort** remove the annotation to cancel: - `kubectl annotate cassandra -n demo cas-dcdr dr.kubedb.com/switchover-to-`. + `kubectl annotate cassandra -n demo cas-dcdr dr.kubedb.com/switchover-to-` # trailing hyphen deletes the annotation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/cassandra/dr/runbook/index.md` around lines 200 - 201, Clarify the abort step in the Cassandra DR runbook by adding a brief note near the kubectl annotate command that explains the trailing hyphen in dr.kubedb.com/switchover-to- is the kubectl syntax for deleting the annotation. Update the surrounding text in the runbook section that describes “Abort” so readers understand this is an annotation removal, not part of the key name.
55-58: 📐 Maintainability & Code Quality | 🔵 TrivialMinor: "not safety" → "not a safety mechanism" for clarity.
- **The Lease is routing, not safety.** If the Lease is stale, Cassandra keeps running on + **The Lease is routing, not a safety mechanism.** If the Lease is stale, Cassandra keeps running on🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/cassandra/dr/runbook/index.md` around lines 55 - 58, Update the wording in the runbook section to use clearer phrasing: change the Lease description in the documented guidance from “not safety” to “not a safety mechanism.” Keep the rest of the Cassandra DR guidance unchanged, and make the edit in the prose near the Lease bullet so it reads naturally and unambiguously.docs/guides/cassandra/dr/guide/index.md (2)
355-359: 📐 Maintainability & Code Quality | 🔵 TrivialFix blank line inside blockquote.
The blank line at 355 breaks the blockquote continuity. The warning note starting at 356 will render as a separate blockquote (or not as a blockquote at all depending on renderer). Merge into a single blockquote or separate completely.
> After changing a DC's node count, run `nodetool cleanup` on the DC (removes data no > longer owned after token ranges move) and a `nodetool repair` if you also changed a > keyspace's replication factor. - +> > **Note:** the distributed Cassandra substrate and the DC-DR layer are net-new for > Cassandra (KubeDB models Cassandra as a single logical DC today). Treat the field names > and flows in this guide as the intended user experience; confirm availability in your > release before relying on them in production.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/cassandra/dr/guide/index.md` around lines 355 - 359, The blockquote in the guide is split by a blank line, which breaks the note rendering. Update the note text so the entire advisory stays in a single continuous blockquote in the markdown content, or remove the blockquote style entirely if that is the intended formatting. Locate the issue in the Cassandra DR guide note paragraph and keep the “Note” text and the following lines together without an empty line between them.Source: Linters/SAST tools
367-370: 🗄️ Data Integrity & Integration | 🔵 TrivialAdd owner references to generated PlacementPolicies or document why not.
The guide states generated per-DC
PlacementPolicies"carry no owner reference, so the operator deletes them explicitly." This is a design gap that could lead to orphaned resources if the operator fails during deletion. Consider addingownerReferenceto link them to theCassandraCR, or document why this isn't possible.- Per `deletionPolicy`, the operator removes the per-DC datacenters and the cluster-scoped - per-DC `PlacementPolicies` it generated (these carry no owner reference, so the operator - deletes them explicitly). The user-provided base `PlacementPolicy` is left for you to - delete. + Per `deletionPolicy`, the operator removes the per-DC datacenters and the cluster-scoped + per-DC `PlacementPolicies` it generated. The user-provided base `PlacementPolicy` is left for you to + delete.(Or add a TODO to add ownerReference if technically feasible.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/cassandra/dr/guide/index.md` around lines 367 - 370, The guide’s deletionPolicy section is describing generated per-DC PlacementPolicies without owner references, which leaves the design ambiguous. Update the docs around the per-DC PlacementPolicies and deletion flow to either state why ownerReference cannot be used for the generated PlacementPolicy objects, or note that adding owner references to the Cassandra CR is a TODO if feasible. Keep the explanation aligned with the deletionPolicy behavior and the operator’s explicit cleanup of generated resources.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/guides/cassandra/dr/guide/index.md`:
- Around line 355-359: The blockquote in the guide is split by a blank line,
which breaks the note rendering. Update the note text so the entire advisory
stays in a single continuous blockquote in the markdown content, or remove the
blockquote style entirely if that is the intended formatting. Locate the issue
in the Cassandra DR guide note paragraph and keep the “Note” text and the
following lines together without an empty line between them.
- Around line 367-370: The guide’s deletionPolicy section is describing
generated per-DC PlacementPolicies without owner references, which leaves the
design ambiguous. Update the docs around the per-DC PlacementPolicies and
deletion flow to either state why ownerReference cannot be used for the
generated PlacementPolicy objects, or note that adding owner references to the
Cassandra CR is a TODO if feasible. Keep the explanation aligned with the
deletionPolicy behavior and the operator’s explicit cleanup of generated
resources.
In `@docs/guides/cassandra/dr/runbook/index.md`:
- Around line 200-201: Clarify the abort step in the Cassandra DR runbook by
adding a brief note near the kubectl annotate command that explains the trailing
hyphen in dr.kubedb.com/switchover-to- is the kubectl syntax for deleting the
annotation. Update the surrounding text in the runbook section that describes
“Abort” so readers understand this is an annotation removal, not part of the key
name.
- Around line 55-58: Update the wording in the runbook section to use clearer
phrasing: change the Lease description in the documented guidance from “not
safety” to “not a safety mechanism.” Keep the rest of the Cassandra DR guidance
unchanged, and make the edit in the prose near the Lease bullet so it reads
naturally and unambiguously.
In `@docs/guides/cassandra/README.md`:
- Around line 63-65: The link text in the Cassandra DC-DR section is too
generic, so update the markdown in the README to use descriptive anchor text
instead of “here.” Keep the same target URL, but change the sentence so the link
text clearly identifies the destination, using the surrounding section title
“Cross-DC Disaster Recovery (DC-DR)” as the reference point.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 12bf489c-2500-471a-b049-9f84b8424d4c
📒 Files selected for processing (5)
docs/guides/cassandra/README.mddocs/guides/cassandra/dr/_index.mddocs/guides/cassandra/dr/guide/index.mddocs/guides/cassandra/dr/overview/index.mddocs/guides/cassandra/dr/runbook/index.md
Align the docs with the shipped apimachinery Cassandra DR types (kubedb/apimachinery#1800): - upNormalNodes -> upNodes, pendingHints -> hintBacklogBytes, repairBacklog -> pendingRanges (the real status field names). - CassandraOpsRequest horizontalScaling.dataCenters uses node, not replicas. - pendingRanges is streaming/pending ranges (nodetool netstats), not repair staleness; healthy is health-Lease freshness; add the replicationFactor field. Signed-off-by: Tamal Saha <tamal@appscode.com>
The DC-DR PlacementPolicy reference is spec.podPlacementPolicy (a field on CassandraSpec that the operator reads to spread the per-DC datacenters), not spec.podTemplate.spec.podPlacementPolicy (the unrelated generic intra-cluster pod-spreading field). Move it to the top level in the overview and guide manifests. Signed-off-by: Tamal Saha <tamal@appscode.com>
Adds the Cassandra cross data center disaster recovery (DC-DR) documentation, mirroring the ClickHouse DR docs but rewritten for Cassandra's masterless, geo-native model.
Files added
docs/guides/cassandra/dr/_index.md: the DR section menu entry (identifiercas-dr-cassandra, parentcas-cassandra-guides, weight 55).docs/guides/cassandra/dr/overview/index.md: concepts and architecture plus a quick start. One masterless ring across DCs, each Member DC a full Cassandra datacenter,NetworkTopologyStrategycontinuous replication, the Lease routes and observes only (no promotion, no fence),LOCAL_QUORUMvsEACH_QUORUMas the correctness tradeoff, engine-free Arbiter DC only for an even data-DC count, 3-DC odd layout example.docs/guides/cassandra/dr/guide/index.md: full user guide (components, DC-name contract, deployment, keyspaces withNetworkTopologyStrategy, connecting through the single endpoint, consistency levels,status.disasterRecoverywith per-DC UN node counts and hint/repair backlog as the lag proxy, switchover as a routing move, repair-based failback with no rewind, per-DC day-2 ops).docs/guides/cassandra/dr/runbook/index.md: scenario-by-scenario operational procedures (active DC loss, partition, switchover, failback, standby loss, hint backlog, unexpected write rejection, coordination plane loss, post-partition divergence).Also wires a DC-DR link into
docs/guides/cassandra/README.md, matching the ClickHouse README pattern.Cassandra-specific adaptation
These docs reflect Cassandra's actual semantics, not ClickHouse's: masterless leaderless ring with gossip membership, the Lease routes and observes only (active-active writes are legitimate), consistency level (
LOCAL_QUORUM,EACH_QUORUM) is the correctness knob rather than a fence, a partitioned DC keeps servingLOCAL_QUORUM(AP), and failback reconciles via hinted handoff plusnodetool repairwith no rewind (last-write-wins by timestamp). ClickHouse Keeper / ReplicatedMergeTree / shard language is replaced with Cassandra ring / gossip / NetworkTopologyStrategy / racks / nodetool language.Front-matter follows existing cassandra docs conventions (
cas-identifier prefix,docs_{{ .version }}menu,cas-cassandra-guidesparent).Summary by CodeRabbit
New Features
Documentation
status.disasterRecovery, and operational troubleshooting (including switchover/failback, scaling, and cleanup).