Skip to content

Add Elasticsearch cross data center disaster recovery (DC-DR) docs#948

Open
tamalsaha wants to merge 1 commit into
masterfrom
elasticsearch-dc-dr-docs
Open

Add Elasticsearch cross data center disaster recovery (DC-DR) docs#948
tamalsaha wants to merge 1 commit into
masterfrom
elasticsearch-dc-dr-docs

Conversation

@tamalsaha

@tamalsaha tamalsaha commented Jul 1, 2026

Copy link
Copy Markdown
Member

Adds four new pages under docs/guides/elasticsearch/dr/ documenting cross data center disaster recovery for KubeDB Elasticsearch (and OpenSearch, which shares the shape):

  • section _index (menu parent)
  • overview: concepts and quick start
  • user guide: components, DC-name contract, deploying, what the operator creates, connecting/indexing, monitoring, the follower-read-only fence, switchover, failback, day-2 ops, limitations
  • runbook: quick reference, golden rules, 12 numbered scenarios, escalation checklist

The docs describe the CCR (Cross-Cluster Replication) active/passive model: two self-contained Elasticsearch clusters (one per Member DC) with intra-DC master quorums, joined by asynchronous leader/follower CCR; the follower-read-only fence (fail closed); a single Lease-gated search/index endpoint; status.disasterRecovery (activeDC, phase, per-DC role/writable/nodesReady/followLagOps/healthy); planned switchover via the dr.kubedb.com/switchover-to annotation (drains CCR for zero doc loss); and failback. CCR licensing (Elastic Platinum/Enterprise, or the OpenSearch replication plugin) and cross-DC transport (9300) and client (9200) reachability are called out.

Companion to apimachinery PR #1804 and the elasticsearch operator DC-DR PR.

Summary by CodeRabbit

  • Documentation
    • Added new Elasticsearch disaster recovery documentation, including an overview, a detailed user guide, and an operational runbook.
    • Covers cross–data center failover and switchover concepts, setup steps, monitoring signals, and recovery procedures.
    • Includes practical guidance for active/passive operation, write protection on standby sites, replication lag checks, and common troubleshooting scenarios.

Signed-off-by: Tamal Saha <tamal@appscode.com>
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds new Elasticsearch disaster recovery (DC-DR) documentation: a menu entry, a conceptual overview page, a detailed deployment/operations guide, and an operational runbook with 12 failure/recovery scenarios and an escalation checklist. No code changes are included.

Changes

Elasticsearch DC-DR Documentation

Layer / File(s) Summary
Menu entry
docs/guides/elasticsearch/dr/_index.md
Adds front-matter and docs menu metadata for the new Disaster Recovery section.
Overview: concepts, roles, prerequisites, deployment, lifecycle
docs/guides/elasticsearch/dr/overview/index.md
Introduces the DC-DR conceptual model, Member/Arbiter roles, single-CR/single-endpoint design, prerequisites, PlacementPolicy/Elasticsearch CR examples, status observability, and failover/switchover/failback/cleanup lifecycle.
Detailed guide: naming contract, deployment, fencing, operations
docs/guides/elasticsearch/dr/guide/index.md
Documents the DC-name contract, deployment model, indexing/observability, follower read-only fencing rules, planned switchover/failback procedures, day-2 operations, deletion behavior, and limitations.
Runbook: golden rules and scenario procedures
docs/guides/elasticsearch/dr/runbook/index.md
Adds quick reference, golden rules, and 12 detailed scenarios (node loss, DC loss, partition, switchover, non-writable indices, failback, standby loss, follow lag, Arbiter loss, control-plane unavailability, active-DC determination, split writes) plus an escalation checklist.

Estimated code review effort: 2 (Simple) | ~15 minutes

Related PRs: None found.

Suggested labels: documentation

Suggested reviewers: None found.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the new Elasticsearch DC-DR documentation pages added in the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch elasticsearch-dc-dr-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
docs/guides/elasticsearch/dr/overview/index.md (3)

274-274: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optional: use "deliberately" instead of "on purpose."

- To move the active DC on purpose without losing documents, annotate the Elasticsearch with
+ To deliberately move the active DC without losing documents, annotate the Elasticsearch with
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/elasticsearch/dr/overview/index.md` at line 274, The sentence in
the Elasticsearch DR overview uses “on purpose” and should be revised to the
preferred wording “deliberately.” Update the phrasing in the affected
documentation text while keeping the rest of the instruction intact.

31-31: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Use descriptive link text.

Replace "here" with descriptive text for accessibility.

- > **New to KubeDB?** Please start [here](/docs/README.md).
+ > **New to KubeDB?** Please start with the [KubeDB documentation overview](/docs/README.md).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/elasticsearch/dr/overview/index.md` at line 31, The introductory
link text in the Elasticsearch DR overview is too generic for accessibility.
Update the markdown link in the opening paragraph to use descriptive text
instead of “here,” while keeping the link target to the docs README unchanged.

237-252: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider combining command and output in one block.

The kubectl get command and its JSON output are in separate blocks. Combining them (with the output as a comment or shown directly after) can satisfy MD014 and improve copy-paste safety.

  ```bash
  $ kubectl get elasticsearch -n demo es-dcdr -o jsonpath='{.status.disasterRecovery}' | jq
{
  "activeDC": "dc-a",
  ...
}

Optional consolidation:

```bash
$ kubectl get elasticsearch -n demo es-dcdr -o jsonpath='{.status.disasterRecovery}' | jq
{
  "activeDC": "dc-a",
  ...
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/elasticsearch/dr/overview/index.md` around lines 237 - 252, The
example in the disaster recovery overview splits the `kubectl get elasticsearch`
command and its JSON result into separate blocks, which should be consolidated
for MD014 and easier copy-paste. Update the example around the `kubectl get
elasticsearch ... | jq` snippet so the command and its output are shown together
in a single bash block or otherwise directly associated in one block, keeping
the JSON result immediately after the command.
docs/guides/elasticsearch/dr/guide/index.md (2)

15-15: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add hyphen for compound modifier.

"cross data center" should be hyphenated as "cross-data-center" when modifying "disaster recovery."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/elasticsearch/dr/guide/index.md` at line 15, The compound
modifier in this Elasticsearch disaster recovery guide needs hyphenation. Update
the sentence in the guide text so the phrase modifying disaster recovery uses
“cross-data-center” instead of “cross data center,” and keep the wording
consistent anywhere this same description appears in the document.

24-24: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Use descriptive link text.

Replace "here" with descriptive text like "KubeDB documentation."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/elasticsearch/dr/guide/index.md` at line 24, The introductory
link text is too generic in the Elasticsearch DR guide. Update the markdown link
in the opening note so it uses descriptive text instead of “here,” such as
“KubeDB documentation,” while keeping the target to the existing README.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/guides/elasticsearch/dr/guide/index.md`:
- Line 31: Align the naming in the DR guide so the `dr-controlplane` agent
description and the DC-name contract use the same marker terminology. Update the
sentence in the table entry for `dr-controlplane` to refer to the
`data.activeDC` field as the active-dc marker, or explicitly explain that the
primary-DC decision is projected into `data.activeDC` so readers can connect the
two terms without ambiguity.

In `@docs/guides/elasticsearch/dr/runbook/index.md`:
- Line 23: The introductory link text in the Elasticsearch DR runbook is too
generic; update the markdown anchor in the opening callout to use descriptive
text that identifies the destination, such as the KubeDB quick start guide,
instead of “here”.

---

Nitpick comments:
In `@docs/guides/elasticsearch/dr/guide/index.md`:
- Line 15: The compound modifier in this Elasticsearch disaster recovery guide
needs hyphenation. Update the sentence in the guide text so the phrase modifying
disaster recovery uses “cross-data-center” instead of “cross data center,” and
keep the wording consistent anywhere this same description appears in the
document.
- Line 24: The introductory link text is too generic in the Elasticsearch DR
guide. Update the markdown link in the opening note so it uses descriptive text
instead of “here,” such as “KubeDB documentation,” while keeping the target to
the existing README.

In `@docs/guides/elasticsearch/dr/overview/index.md`:
- Line 274: The sentence in the Elasticsearch DR overview uses “on purpose” and
should be revised to the preferred wording “deliberately.” Update the phrasing
in the affected documentation text while keeping the rest of the instruction
intact.
- Line 31: The introductory link text in the Elasticsearch DR overview is too
generic for accessibility. Update the markdown link in the opening paragraph to
use descriptive text instead of “here,” while keeping the link target to the
docs README unchanged.
- Around line 237-252: The example in the disaster recovery overview splits the
`kubectl get elasticsearch` command and its JSON result into separate blocks,
which should be consolidated for MD014 and easier copy-paste. Update the example
around the `kubectl get elasticsearch ... | jq` snippet so the command and its
output are shown together in a single bash block or otherwise directly
associated in one block, keeping the JSON result immediately after the command.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c62ecd7b-2ef3-43dc-af3d-0636147a29a1

📥 Commits

Reviewing files that changed from the base of the PR and between 405b88b and c3f3019.

📒 Files selected for processing (4)
  • docs/guides/elasticsearch/dr/_index.md
  • docs/guides/elasticsearch/dr/guide/index.md
  • docs/guides/elasticsearch/dr/overview/index.md
  • docs/guides/elasticsearch/dr/runbook/index.md

| Component | Runs in | Responsibility |
| --- | --- | --- |
| **`dr-controlplane`** + 3-site etcd quorum | across the data centers (an OCM control plane) | Publishes one `coordination.k8s.io` **Lease** per failover scope. The Lease holder is the active write DC. This is the single cross-DC failover authority. |
| **`dr-controlplane` agent** | each spoke (DC) | Contends for the primary-DC Lease for its DC and projects the Lease decision into the local spoke as the primary-dc marker the fence reads. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Inconsistent naming: "primary-dc marker" vs data.activeDC.

The table says the agent projects the Lease decision as the "primary-dc marker the fence reads," but the DC-name contract below lists data.activeDC as the marker field. Align on "active-dc marker" or clarify the relationship between "primary-dc" and "activeDC" to avoid reader confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/elasticsearch/dr/guide/index.md` at line 31, Align the naming in
the DR guide so the `dr-controlplane` agent description and the DC-name contract
use the same marker terminology. Update the sentence in the table entry for
`dr-controlplane` to refer to the `data.activeDC` field as the active-dc marker,
or explicitly explain that the primary-DC decision is projected into
`data.activeDC` so readers can connect the two terms without ambiguity.

commands referenced here. Throughout, `<coord>` is the coordination control plane kubeconfig,
and `es-dcdr`/`demo` are the example database and namespace.

> **New to KubeDB?** Please start [here](/docs/README.md).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use descriptive link text.

"here" is not descriptive. Replace with text that describes the destination, e.g., "the KubeDB quick start guide".

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 23-23: Link text should be descriptive

(MD059, descriptive-link-text)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/elasticsearch/dr/runbook/index.md` at line 23, The introductory
link text in the Elasticsearch DR runbook is too generic; update the markdown
anchor in the opening callout to use descriptive text that identifies the
destination, such as the KubeDB quick start guide, instead of “here”.

Source: Linters/SAST tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant