Skip to content

Docs: Cluster Recovery After Full Power Outage - Best Practices and Troubleshooting #441

@jasnoyaeger

Description

@jasnoyaeger

Summary

Customers need documented best practices for recovering a VergeOS cluster after a full power outage where all hosts go down simultaneously, particularly regarding vSAN data integrity.

Type

How-to Guide / Troubleshooting

Suggested Content

  • Audience: VergeOS administrators, datacenter operations
  • Prerequisites: Familiarity with VergeOS cluster architecture, vSAN tiers, and physical node management
  • Key sections:
    • Expected cluster behavior after simultaneous full power loss
    • Recommended host power-on sequence (e.g., first node to bring up, quorum considerations)
    • Rejoin order and how nodes resync vSAN tiers
    • Step-by-step recovery procedure (pre-checks, power-on, validation)
    • How to verify vSAN health and tier sync status post-recovery
    • Recommendations to prevent data inconsistency or corruption (UPS sizing, graceful shutdown automation, fencing)
    • Troubleshooting: what to do if a node fails to rejoin, split-brain symptoms, stuck sync
    • When to engage VergeIO support before taking recovery actions

Related Existing Docs

The following pages cover adjacent topics but none provide a consolidated unplanned outage recovery workflow:

Context

Requested via support interaction. Customer asked for:

  • Recommended host power-on and rejoin order
  • Suggested recovery procedure
  • Recommendations to prevent data inconsistency or corruption

This is a recurring inquiry from customers operating in environments without reliable UPS coverage or after datacenter-wide outages. Existing docs cover graceful shutdown but not full unplanned outage recovery as a consolidated workflow.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions