Skip to content

Add ClickHouse DC-DR guide#945

Open
tamalsaha wants to merge 2 commits into
masterfrom
clickhouse-dc-dr-docs
Open

Add ClickHouse DC-DR guide#945
tamalsaha wants to merge 2 commits into
masterfrom
clickhouse-dc-dr-docs

Conversation

@tamalsaha

@tamalsaha tamalsaha commented Jul 1, 2026

Copy link
Copy Markdown
Member

Adds the cross data center disaster recovery (DC-DR) documentation for KubeDB ClickHouse, mirroring the structure of the MongoDB DC-DR docs. ClickHouse is the multi-master, Keeper-quorum analog of MongoDB: ReplicatedMergeTree replicas in each DC replicate natively and asynchronously over port 9009, coordinated by a shared ClickHouse Keeper (Raft) ensemble. There is no second replication link to build and no promotion step.

New pages under docs/guides/clickhouse/dr/:

  • Overview (overview/index.md): why ClickHouse DC-DR is the MongoDB analog, the four core rules (Keeper spread 3-site as the failover authority, Keeper quorum as the split-brain guarantee, the Lease as write-endpoint routing only, local reads), the three Keeper placement topologies (3-site spread as the documented automatic path, two-cluster per-region Keeper often better for write-heavy ingest, single-DC Keeper for lowest latency with manual failover), data center roles, the single-CR single-endpoint model, prerequisites, a deploy walkthrough with a realistic PlacementPolicy (two Member DCs + one Arbiter DC holding a data-less Keeper voter and the dr-controlplane etcd member), and status.disasterRecovery.
  • User Guide (guide/index.md): components, the DC-name contract, deployment, connecting through the single write endpoint, the Keeper-quorum write contract, local reads, monitoring via system.replicas (absolute_delay, queue_size, log_pointer vs log_max_index), lag and RPO, Keeper placement and the arbiter, planned switchover via the dr.kubedb.com/switchover-to annotation, native failback (no rewind), and day-2 ops including per-DC HorizontalScaling.
  • Runbook (runbook/index.md): twelve scenarios (active-DC loss, partition, planned switchover, failback, arbiter-DC loss, stuck switchover, standby loss, re-add a DC, Keeper quorum lost, unexpected read-only, coordination-plane loss, suspected split-brain) each with symptoms, automatic behavior, verification, and action.

Key correctness points: the ClickHouse Keeper Raft quorum is the data-plane safety (a partitioned minority DC loses quorum, cannot register parts, and its inserts fail); the dr-controlplane Lease is only routing, policy, and observability (it steers the single write endpoint, it does not promote anything); failback is native and clean with no rewind. Also adds a DC-DR link to the ClickHouse guides README.

Summary by CodeRabbit

  • New Features
    • Added Cross Data Center Disaster Recovery (DC-DR) documentation for ClickHouse, including an end-to-end setup overview and a dedicated operational runbook.
  • Documentation
    • Documented multi-DC deployment concepts, Keeper quorum split-brain prevention, Lease-routed single write endpoint behavior, local read guidance, and operational verification via disaster recovery status.
    • Added procedures for planned switchover, failback, recovery from DC loss, troubleshooting scenarios, and cleanup expectations.

Signed-off-by: Tamal Saha <tamal@appscode.com>
kodiak-appscode[bot]
kodiak-appscode Bot previously approved these changes Jul 1, 2026
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 59d5d3f5-00b0-4fc0-8e2c-1b7d075765c2

📥 Commits

Reviewing files that changed from the base of the PR and between 9857a3e and 1bcd619.

📒 Files selected for processing (1)
  • docs/guides/clickhouse/dr/overview/index.md
✅ Files skipped from review due to trivial changes (1)
  • docs/guides/clickhouse/dr/overview/index.md

📝 Walkthrough

Walkthrough

This PR adds ClickHouse DC-DR documentation across the README, overview, user guide, and runbook, covering deployment structure, routing and quorum behavior, operational procedures, and troubleshooting scenarios.

Changes

DC-DR Documentation

Layer / File(s) Summary
README entry and DR index page
docs/guides/clickhouse/README.md, docs/guides/clickhouse/dr/_index.md
Adds a Cross-DC Disaster Recovery section to the ClickHouse README and adds the DR menu index page front matter.
DC-DR overview guide
docs/guides/clickhouse/dr/overview/index.md
Adds the ClickHouse DC-DR overview page covering the multi-DC model, Keeper quorum rules, Lease routing, deployment walkthrough, observed status, failover and switchover behavior, and cleanup.
User guide: contract and deployment configuration
docs/guides/clickhouse/dr/guide/index.md
Adds the DC-DR user guide front matter, DC-name consistency contract, PlacementPolicy topology mapping, sample ClickHouse CR, and operator-created resource descriptions.
User guide: connectivity, observability, replication, and switchover
docs/guides/clickhouse/dr/guide/index.md
Documents write endpoint and AppBinding connectivity, local read guidance, status.disasterRecovery observability, replication and RPO behavior, Keeper placement rules, planned switchover and failback, scaling, cleanup, and limitations.
Runbook introduction and quick reference
docs/guides/clickhouse/dr/runbook/index.md
Adds runbook front matter, terminology, quick-reference commands, and golden rules.
Runbook: failure and partition scenarios
docs/guides/clickhouse/dr/runbook/index.md
Defines procedures for active DC loss, network partition, arbiter loss, standby loss, DC recovery, and Keeper quorum loss.
Runbook: switchover and diagnostics
docs/guides/clickhouse/dr/runbook/index.md
Defines planned switchover and failback, stuck-switchover diagnosis and abort, unexpected read-only behavior, coordination plane issues, split-brain handling, and the escalation checklist.

Estimated code review effort: 2 (Simple) | ~15 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding ClickHouse DC-DR documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch clickhouse-dc-dr-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
docs/guides/clickhouse/dr/runbook/index.md (1)

64-65: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add "from" for clarity: "moves from FailingOver to Steady".

Current phrasing "phase moves FailingOver to Steady" is slightly telegraphic. Suggest: "moves from FailingOver to Steady".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/runbook/index.md` around lines 64 - 65, The wording
in the runbook sentence is missing the transition preposition, making the state
change hard to read. Update the text around the “phase” description so it says
it moves from `FailingOver` to `Steady`, preserving the same meaning but
improving clarity in the DR runbook section.
docs/guides/clickhouse/dr/guide/index.md (2)

15-18: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Fix hyphenation: "cross data center" should be "cross-data-center".

The phrase "cross data center disaster recovery" is missing hyphens. It should read "cross-data-center disaster recovery" to be grammatically correct as a compound modifier.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/guide/index.md` around lines 15 - 18, Update the
opening description in the ClickHouse DR guide so the compound modifier is
hyphenated correctly; in the document text for the guide intro, change the
phrase used in the overview to “cross-data-center disaster recovery” so the
wording is grammatically consistent.

Source: Linters/SAST tools


130-132: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider formal phrasing: "keep working" → "continue to work".

For more formal documentation tone, consider: "so the single write endpoint and the single AppBinding continue to work as the active DC moves."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/guide/index.md` around lines 130 - 132, The
phrasing in the ClickHouse DR guide is too informal in the sentence about the
single write endpoint and single AppBinding. Update the wording in this doc
section to use a more formal tone by replacing “keep working” with “continue to
work,” so the sentence reads naturally as the active DC moves.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/guides/clickhouse/dr/overview/index.md`:
- Around line 188-191: The overview’s marker reference is inconsistent with the
User Guide’s DC-name contract: it currently says “activeDC” while the contract
uses “data.activeDC.” Update the wording in the overview to explicitly refer to
the marker as the ConfigMap/Secret data path `data.activeDC`, and keep the
terminology aligned with the existing DC-name contract in the guide. Use the
marker mention near the DC-name list as the anchor for the edit so both
documents describe the same field consistently.
- Around line 317-318: Update the wording in the DR overview text to replace the
vague “even layout” phrase with the same topology terminology used elsewhere,
specifically “two-Member-plus-Arbiter layout” or “2+1 layout.” Make the change
in the section describing the surviving DCs and Keeper quorum so the terminology
is consistent with the earlier topology description.
- Line 32: The link text in the ClickHouse DR overview intro is too vague, so
update the existing “here” anchor to use descriptive text that matches its
destination, such as the KubeDB quickstart guide. Make this change in the
introductory sentence so the markdown link remains the same target but the
visible text is meaningful and self-descriptive.

In `@docs/guides/clickhouse/dr/runbook/index.md`:
- Around line 114-115: The wording in the DR runbook is ambiguous about
“near-zero committed writes are lost”; update the sentence in the ClickHouse DR
runbook text to make the RPO meaning explicit, e.g. by rephrasing it around
“committed writes lost are near-zero” or “RPO is near-zero,” while keeping the
rest of the explanation in the same section unchanged.
- Around line 15-16: Update the ClickHouse DR runbook wording so the compound
modifier is hyphenated correctly: change the “cross data center disaster
recovery” phrasing in the introduction to “cross-data-center disaster recovery.”
Keep the rest of the sentence intact and ensure the document consistently uses
the hyphenated form in this section.
- Around line 265-266: The endpoint guidance in the runbook uses the placeholder
`<db>`, which reads like a database name instead of a write endpoint. Update the
sentence in this section to use `<endpoint>` or rephrase it more clearly as
“Point writes at the endpoint, not at this DC directly,” keeping the rest of the
standby guidance unchanged.
- Around line 137-139: The DR runbook text still references PostgreSQL-specific
pg_rewind, which does not apply to ClickHouse. Update the affected prose in the
runbook section describing the near-zero-RPO flow to remove pg_rewind and
replace it with ClickHouse-appropriate wording that explains there is no
divergence/rollback step for a DC that lacked Keeper quorum.

In `@docs/guides/clickhouse/README.md`:
- Line 50: The ClickHouse README uses non-descriptive link text in the sentence
that ends with the DR overview link. Update the anchor text in the relevant
documentation section to a meaningful phrase that describes the destination,
such as the DC-DR Overview guide, while keeping the same link target. Locate the
prose near the ClickHouse disaster-recovery explanation in the README and
replace the generic “Follow here” wording with descriptive text.

---

Nitpick comments:
In `@docs/guides/clickhouse/dr/guide/index.md`:
- Around line 15-18: Update the opening description in the ClickHouse DR guide
so the compound modifier is hyphenated correctly; in the document text for the
guide intro, change the phrase used in the overview to “cross-data-center
disaster recovery” so the wording is grammatically consistent.
- Around line 130-132: The phrasing in the ClickHouse DR guide is too informal
in the sentence about the single write endpoint and single AppBinding. Update
the wording in this doc section to use a more formal tone by replacing “keep
working” with “continue to work,” so the sentence reads naturally as the active
DC moves.

In `@docs/guides/clickhouse/dr/runbook/index.md`:
- Around line 64-65: The wording in the runbook sentence is missing the
transition preposition, making the state change hard to read. Update the text
around the “phase” description so it says it moves from `FailingOver` to
`Steady`, preserving the same meaning but improving clarity in the DR runbook
section.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: cd5aecbf-4ae8-4905-82ad-067c8a234889

📥 Commits

Reviewing files that changed from the base of the PR and between 405b88b and 9857a3e.

📒 Files selected for processing (5)
  • docs/guides/clickhouse/README.md
  • docs/guides/clickhouse/dr/_index.md
  • docs/guides/clickhouse/dr/guide/index.md
  • docs/guides/clickhouse/dr/overview/index.md
  • docs/guides/clickhouse/dr/runbook/index.md

- [DC-DR Runbook](/docs/guides/clickhouse/dr/runbook/index.md) for what to do in each
operational scenario.

> **New to KubeDB?** Please start [here](/docs/README.md).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use descriptive link text.

"here" does not describe the destination. Replace with something like "KubeDB quickstart guide".

-> **New to KubeDB?** Please start [here](/docs/README.md).
+> **New to KubeDB?** Please start with the [KubeDB quickstart guide](/docs/README.md).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> **New to KubeDB?** Please start [here](/docs/README.md).
> **New to KubeDB?** Please start with the [KubeDB quickstart guide](/docs/README.md).
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 32-32: Link text should be descriptive

(MD059, descriptive-link-text)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/overview/index.md` at line 32, The link text in the
ClickHouse DR overview intro is too vague, so update the existing “here” anchor
to use descriptive text that matches its destination, such as the KubeDB
quickstart guide. Make this change in the introductory sentence so the markdown
link remains the same target but the visible text is meaningful and
self-descriptive.

Source: Linters/SAST tools

Comment on lines +188 to +191
- One consistent **DC name** per data center, used everywhere: the OCM spoke cluster
name, the agent `--dc-name`, the Lease `holderIdentity`, the marker `activeDC`, the
pod label `open-cluster-management.io/cluster-name`, and the `PlacementPolicy`
`distributionRule.clusterName`. Keep them identical.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Align "marker" reference with the User Guide's DC-name contract.

The User Guide specifies data.activeDC for the marker, while this overview uses activeDC. Clarify whether the marker is a ConfigMap/Secret key path (data.activeDC) or a field name, and keep both documents consistent.

Cross-reference: docs/guides/clickhouse/dr/guide/index.md:36-46 defines the DC-name contract as the marker data.activeDC.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/overview/index.md` around lines 188 - 191, The
overview’s marker reference is inconsistent with the User Guide’s DC-name
contract: it currently says “activeDC” while the contract uses “data.activeDC.”
Update the wording in the overview to explicitly refer to the marker as the
ConfigMap/Secret data path `data.activeDC`, and keep the terminology aligned
with the existing DC-name contract in the guide. Use the marker mention near the
DC-name list as the anchor for the edit so both documents describe the same
field consistently.

Comment on lines +317 to +318
When the active DC is lost, the surviving DCs that still hold Keeper quorum (a standby
data DC plus the arbiter DC in the even layout) **keep accepting writes on their own**,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Replace unclear "even layout" terminology.

Use "two-Member-plus-Arbiter layout" or "2+1 layout" to match the topology described earlier.

-surviving DCs that still hold Keeper quorum (a standby data DC plus the arbiter DC in the even layout)
+surviving DCs that still hold Keeper quorum (a standby Member DC plus the Arbiter DC in the 2+1 layout)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
When the active DC is lost, the surviving DCs that still hold Keeper quorum (a standby
data DC plus the arbiter DC in the even layout) **keep accepting writes on their own**,
When the active DC is lost, the surviving DCs that still hold Keeper quorum (a standby
standby Member DC plus the Arbiter DC in the 2+1 layout) **keep accepting writes on their own**,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/overview/index.md` around lines 317 - 318, Update
the wording in the DR overview text to replace the vague “even layout” phrase
with the same topology terminology used elsewhere, specifically
“two-Member-plus-Arbiter layout” or “2+1 layout.” Make the change in the section
describing the surviving DCs and Keeper quorum so the terminology is consistent
with the earlier topology description.

Comment on lines +15 to +16
Scenario-by-scenario procedures for operating a ClickHouse cluster in cross data center
disaster recovery (DC-DR) mode. Each scenario lists the **symptoms**, what KubeDB and

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Hyphenate "cross-data-center".

Per the grammar hint, "cross data center disaster recovery" should be "cross-data-center disaster recovery" (compound modifier before the noun).

🧰 Tools
🪛 LanguageTool

[grammar] ~15-~15: Use a hyphen to join words.
Context: ... operating a ClickHouse cluster in cross data center disaster recovery (DC-DR) mo...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/runbook/index.md` around lines 15 - 16, Update the
ClickHouse DR runbook wording so the compound modifier is hyphenated correctly:
change the “cross data center disaster recovery” phrasing in the introduction to
“cross-data-center disaster recovery.” Keep the rest of the sentence intact and
ensure the document consistently uses the hyphenated form in this section.

Source: Linters/SAST tools

Comment on lines +114 to +115
to `dc-b`. Because it waits for the target to catch up before flipping, near-zero committed
writes are lost. There is no promotion step.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Clarify "near-zero committed writes are lost".

The phrasing is ambiguous. Suggest: "Because it waits for the target to catch up before flipping, committed writes lost are near-zero" or "RPO is near-zero".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/runbook/index.md` around lines 114 - 115, The
wording in the DR runbook is ambiguous about “near-zero committed writes are
lost”; update the sentence in the ClickHouse DR runbook text to make the RPO
meaning explicit, e.g. by rephrasing it around “committed writes lost are
near-zero” or “RPO is near-zero,” while keeping the rest of the explanation in
the same section unchanged.

Comment on lines +137 to +139
Same near-zero-RPO flow as scenario 3. There is no `pg_rewind` step and no rollback; a DC
that lacked Keeper quorum committed nothing to diverge.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Remove PostgreSQL-specific pg_rewind reference.

ClickHouse does not use pg_rewind (a PostgreSQL tool). This appears to be copy-paste residue from another database's DC-DR docs. Replace with a ClickHouse-appropriate description.

-Same near-zero-RPO flow as scenario 3. There is no `pg_rewind` step and no rollback; a DC
-that lacked Keeper quorum committed nothing to diverge.
+Same near-zero-RPO flow as scenario 3. There is no rewind step and no rollback; a DC
+that lacked Keeper quorum committed nothing, so there is nothing to diverge.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Same near-zero-RPO flow as scenario 3. There is no `pg_rewind` step and no rollback; a DC
that lacked Keeper quorum committed nothing to diverge.
Same near-zero-RPO flow as scenario 3. There is no rewind step and no rollback; a DC
that lacked Keeper quorum committed nothing, so there is nothing to diverge.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/runbook/index.md` around lines 137 - 139, The DR
runbook text still references PostgreSQL-specific pg_rewind, which does not
apply to ClickHouse. Update the affected prose in the runbook section describing
the near-zero-RPO flow to remove pg_rewind and replace it with
ClickHouse-appropriate wording that explains there is no divergence/rollback
step for a DC that lacked Keeper quorum.

Comment on lines +265 to +266
is a standby (correct). Point writes at the endpoint `<db>`, not at this DC directly.
- **Lost Keeper quorum** `keeperQuorum:false` or `is_readonly:1` means this DC cannot reach

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix placeholder in endpoint guidance.

"Point writes at the endpoint <db>" uses <db> which suggests a database name, not the write endpoint. Replace with <endpoint> or rephrase to "Point writes at the endpoint, not at this DC directly."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/dr/runbook/index.md` around lines 265 - 266, The
endpoint guidance in the runbook uses the placeholder `<db>`, which reads like a
database name instead of a write endpoint. Update the sentence in this section
to use `<endpoint>` or rephrase it more clearly as “Point writes at the
endpoint, not at this DC directly,” keeping the rest of the standby guidance
unchanged.


## Cross-DC Disaster Recovery (DC-DR)

Do you want to run your ClickHouse database across multiple data centers and recover from a full data center failure with a single, automatically re-routing write endpoint? KubeDB runs one logical `ReplicatedMergeTree` cluster across the data centers, spreads ClickHouse Keeper 3-site so no single data center holds a Keeper majority (the split-brain guarantee), and lets the `dr-controlplane` Lease route the write endpoint to a data center that still holds Keeper quorum. Follow [here](/docs/guides/clickhouse/dr/overview/index.md).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use descriptive link text.

"Follow here" is non-descriptive. Replace with meaningful text such as "the DC-DR Overview guide".

-Follow [here](/docs/guides/clickhouse/dr/overview/index.md).
+Follow the [DC-DR Overview guide](/docs/guides/clickhouse/dr/overview/index.md).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Do you want to run your ClickHouse database across multiple data centers and recover from a full data center failure with a single, automatically re-routing write endpoint? KubeDB runs one logical `ReplicatedMergeTree` cluster across the data centers, spreads ClickHouse Keeper 3-site so no single data center holds a Keeper majority (the split-brain guarantee), and lets the `dr-controlplane` Lease route the write endpoint to a data center that still holds Keeper quorum. Follow [here](/docs/guides/clickhouse/dr/overview/index.md).
Do you want to run your ClickHouse database across multiple data centers and recover from a full data center failure with a single, automatically re-routing write endpoint? KubeDB runs one logical `ReplicatedMergeTree` cluster across the data centers, spreads ClickHouse Keeper 3-site so no single data center holds a Keeper majority (the split-brain guarantee), and lets the `dr-controlplane` Lease route the write endpoint to a data center that still holds Keeper quorum. Follow the [DC-DR Overview guide](/docs/guides/clickhouse/dr/overview/index.md).
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 50-50: Link text should be descriptive

(MD059, descriptive-link-text)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/clickhouse/README.md` at line 50, The ClickHouse README uses
non-descriptive link text in the sentence that ends with the DR overview link.
Update the anchor text in the relevant documentation section to a meaningful
phrase that describes the destination, such as the DC-DR Overview guide, while
keeping the same link target. Locate the prose near the ClickHouse
disaster-recovery explanation in the README and replace the generic “Follow
here” wording with descriptive text.

Source: Linters/SAST tools

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Visit the preview URL for this PR (updated for commit 1bcd619):

https://kubedb-v2-hugo--pr945-clickhouse-dc-dr-doc-9ayz6p9e.web.app

(expires Wed, 08 Jul 2026 05:15:14 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 0f29ae8ae0bd54a99bf2b223b6833be47acd5943

Add the fifth How-it-works rule: ReplicatedMergeTree fetches are not DC-aware, so
with two or more in-DC replicas of a shard the operator designates one in-DC
replica per shard as the cross-DC fetch source and the others fetch intra-DC,
holding cross-DC part traffic to one copy per shard per DC.

Signed-off-by: Tamal Saha <tamal@appscode.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant