feat(health): implement liveness, readiness, and startup probes#38
Open
Vivian-04 wants to merge 1 commit into
Open
feat(health): implement liveness, readiness, and startup probes#38Vivian-04 wants to merge 1 commit into
Vivian-04 wants to merge 1 commit into
Conversation
Add /health/live, /health/ready, and /health/startup endpoints for Kubernetes container orchestration. Readiness checks PostgreSQL (SELECT 1) and Redis (PING) with configurable timeout via HEALTH_CHECK_TIMEOUT_MS. Startup probe additionally verifies TypeORM DataSource initialization. All endpoints are public (no auth), return structured JSON with timestamp and per-component status/responseTime, and respond 503 on failure. Includes unit tests for controller and service, Swagger documentation, and Kubernetes probe YAML example in docs/.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #22 Setup Health Check Endpoints & Readiness Probes
feat(health): Setup Health Check Endpoints & Readiness Probes
Summary
Implements
/health/live,/health/ready, and/health/startupendpoints for Kubernetes liveness, readiness, and startup probes. Checks PostgreSQL and Redis connectivity with configurable timeouts, returns structured JSON with per-component status and response time, and is fully documented in Swagger.Closes #22
What changed
New module —
src/health/health.module.tshealth.service.tsSELECT 1DB ping, RedisPING, startup completeness checkhealth.controller.ts@Public()+@SkipKyc()(no auth required)dto/health-response.dto.tshealth.constants.tsHEALTH_REDIS_CLIENT) to avoid circular importshealth.controller.spec.tshealth.service.spec.tsModified files
src/app.module.ts— registersHealthModulesrc/config/swagger.config.ts— addsHealthtag with descriptionsrc/config/env.validation.ts— adds optionalREDIS_URLandHEALTH_CHECK_TIMEOUT_MSNew docs
docs/kubernetes-health-probes.md— full Kubernetes probe YAML, env var reference, and design rationaleEndpoints
All endpoints live under the global prefix
GET /api/v1/:health/livehealth/readyhealth/startupResponse format
{ "status": "ok", "timestamp": "2024-01-01T00:00:00.000Z", "uptime": 123.456, "components": { "database": { "status": "up", "responseTime": 4 }, "redis": { "status": "up", "responseTime": 1 } } }statusvalues:ok(all up) ·degraded(some up, readiness only) ·error— HTTP 503Design decisions
Liveness never checks dependencies.
A database outage should pull the pod from rotation (readiness), not restart the container (liveness). Restarting does not fix a database.
degradedstate on readiness.When only one dependency is down, returning
degraded(HTTP 200) keeps the pod in rotation rather than immediately removing it. This prevents a Redis blip from taking all pods offline simultaneously.Promise.race+setTimeoutfor timeouts.Each component check races against a configurable deadline (
HEALTH_CHECK_TIMEOUT_MS, default 5 s). A hung database connection will never block a probe response indefinitely.@Res({ passthrough: true })for 503.Throwing an
HttpExceptionwould route throughGlobalExceptionFilter, wrapping the body in{ statusCode, correlationId, message, path }. Health probes need a clean, predictable body regardless of HTTP status, so we set the status code on the response directly and return the DTO normally.No circular imports.
HEALTH_REDIS_CLIENTlives inhealth.constants.tsso neitherhealth.service.tsnorhealth.module.tsimports the other.Acceptance criteria
/health/liveendpoint — basic health (HTTP 200)/health/readyendpoint — database, cache, external services/health/startupendpoint — startup completeness checkSELECT 1with timeoutPINGwith timeoutHEALTH_CHECK_TIMEOUT_MSenv vardocs/kubernetes-health-probes.mdTest plan
npm test -- --testPathPatterns=src/health— all 28 unit tests passGET /api/v1/health/livereturns{ "status": "ok" }GET /api/v1/health/readyreturns{ "status": "ok" }GET /api/v1/health/readyreturns HTTP 200,{ "status": "degraded" }GET /api/v1/health/readyreturns HTTP 503,{ "status": "error" }/api/docsunder the Health tagdocs/kubernetes-health-probes.mdapplies cleanly to a clusterCloses #22