Skip to content

docs: add operational runbooks, postmortem template, and on-call guidance#88

Open
techAlhaji wants to merge 1 commit into
StellarTips:mainfrom
techAlhaji:docs/runbooks-and-postmortems
Open

docs: add operational runbooks, postmortem template, and on-call guidance#88
techAlhaji wants to merge 1 commit into
StellarTips:mainfrom
techAlhaji:docs/runbooks-and-postmortems

Conversation

@techAlhaji

Copy link
Copy Markdown

Closes #78

Summary

Add operational runbooks, a postmortem template, and on-call documentation to improve incident response readiness.

Changes

  • Added runbooks for common operational incidents:

    • Database outages
    • Horizon connectivity issues
    • Elevated error rates
    • Authentication failures
    • Deployment rollbacks
  • Added reusable postmortem template

  • Added on-call and operations guidance

  • Added direct-linkable Markdown sections

Why

Operational incidents currently rely on tribal knowledge and ad hoc response procedures. These documents provide responders with consistent guidance and create a structured process for capturing lessons learned.

Documentation Added

  • docs/runbooks/database-down.md
  • docs/runbooks/stellar-horizon-unreachable.md
  • docs/runbooks/high-error-rate.md
  • docs/runbooks/auth-issues.md
  • docs/runbooks/deployment-rollback.md
  • docs/postmortems/TEMPLATE.md
  • docs/OPERATIONS.md

Scope

This PR is documentation-only and does not modify application code, infrastructure, deployment behavior, or monitoring systems.

Copy link
Copy Markdown
Contributor

Runbooks + postmortem template + on-call doc — solid ops foundation. Thanks for putting this together, merging.

Copy link
Copy Markdown
Contributor

Heads up — my earlier note said 'merging' but the squash actually failed: there's a conflict on docs/OPERATIONS.md since #82 added the Better Uptime status-page content and this PR rewrites that same file with on-call guidance. Could you rebase docs/runbooks-and-postmortems onto the latest main and merge the two sections together? Runbooks + postmortems themselves look great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add incident runbooks and postmortem template

2 participants