Skip to content

feat(health): implement liveness, readiness, and startup probes#38

Open
Vivian-04 wants to merge 1 commit into
SourceXXL:mainfrom
Vivian-04:feat/health-check-endpoints
Open

feat(health): implement liveness, readiness, and startup probes#38
Vivian-04 wants to merge 1 commit into
SourceXXL:mainfrom
Vivian-04:feat/health-check-endpoints

Conversation

@Vivian-04

Copy link
Copy Markdown

PR #22 Setup Health Check Endpoints & Readiness Probes

feat(health): Setup Health Check Endpoints & Readiness Probes

Summary

Implements /health/live, /health/ready, and /health/startup endpoints for Kubernetes liveness, readiness, and startup probes. Checks PostgreSQL and Redis connectivity with configurable timeouts, returns structured JSON with per-component status and response time, and is fully documented in Swagger.

Closes #22


What changed

New module — src/health/

File Role
health.module.ts NestJS module; wires up a lazy ioredis client via factory provider
health.service.ts Core logic — SELECT 1 DB ping, Redis PING, startup completeness check
health.controller.ts Three endpoints with @Public() + @SkipKyc() (no auth required)
dto/health-response.dto.ts Typed response shape with full Swagger decoration
health.constants.ts Shared DI token (HEALTH_REDIS_CLIENT) to avoid circular imports
health.controller.spec.ts 12 controller unit tests
health.service.spec.ts 16 service unit tests covering ok / degraded / error / timeout paths

Modified files

  • src/app.module.ts — registers HealthModule
  • src/config/swagger.config.ts — adds Health tag with description
  • src/config/env.validation.ts — adds optional REDIS_URL and HEALTH_CHECK_TIMEOUT_MS

New docs

  • docs/kubernetes-health-probes.md — full Kubernetes probe YAML, env var reference, and design rationale

Endpoints

All endpoints live under the global prefix GET /api/v1/:

Path K8s probe Success Failure
health/live Liveness 200 — process alive never fails
health/ready Readiness 200 — DB + Redis up 503 — both down
health/startup Startup 200 — DB + ORM initialized 503 — not ready

Response format

{
  "status": "ok",
  "timestamp": "2024-01-01T00:00:00.000Z",
  "uptime": 123.456,
  "components": {
    "database": { "status": "up", "responseTime": 4 },
    "redis":    { "status": "up", "responseTime": 1 }
  }
}

status values: ok (all up) · degraded (some up, readiness only) · error — HTTP 503


Design decisions

Liveness never checks dependencies.
A database outage should pull the pod from rotation (readiness), not restart the container (liveness). Restarting does not fix a database.

degraded state on readiness.
When only one dependency is down, returning degraded (HTTP 200) keeps the pod in rotation rather than immediately removing it. This prevents a Redis blip from taking all pods offline simultaneously.

Promise.race + setTimeout for timeouts.
Each component check races against a configurable deadline (HEALTH_CHECK_TIMEOUT_MS, default 5 s). A hung database connection will never block a probe response indefinitely.

@Res({ passthrough: true }) for 503.
Throwing an HttpException would route through GlobalExceptionFilter, wrapping the body in { statusCode, correlationId, message, path }. Health probes need a clean, predictable body regardless of HTTP status, so we set the status code on the response directly and return the DTO normally.

No circular imports.
HEALTH_REDIS_CLIENT lives in health.constants.ts so neither health.service.ts nor health.module.ts imports the other.


Acceptance criteria

Criterion Status
/health/live endpoint — basic health (HTTP 200) Done
/health/ready endpoint — database, cache, external services Done
/health/startup endpoint — startup completeness check Done
Database connection test (< 100 ms response) Done — SELECT 1 with timeout
Cache (Redis) connectivity test Done — PING with timeout
Response format includes timestamp and component status Done
Configurable timeout values Done — HEALTH_CHECK_TIMEOUT_MS env var
Health check responses documented in Swagger Done
Kubernetes probe configuration example in docs Done — docs/kubernetes-health-probes.md
Unit tests for each health endpoint Done — 28 tests total
Failure scenario tests Done — db down, redis down, both down, timeout

Test plan

  • npm test -- --testPathPatterns=src/health — all 28 unit tests pass
  • Start app with live DB — GET /api/v1/health/live returns { "status": "ok" }
  • Start app with live DB + Redis — GET /api/v1/health/ready returns { "status": "ok" }
  • Redis unreachable — GET /api/v1/health/ready returns HTTP 200, { "status": "degraded" }
  • Both DB + Redis down — GET /api/v1/health/ready returns HTTP 503, { "status": "error" }
  • Endpoints appear in Swagger UI at /api/docs under the Health tag
  • Kubernetes probe YAML from docs/kubernetes-health-probes.md applies cleanly to a cluster

Closes #22

Add /health/live, /health/ready, and /health/startup endpoints for
Kubernetes container orchestration. Readiness checks PostgreSQL (SELECT 1)
and Redis (PING) with configurable timeout via HEALTH_CHECK_TIMEOUT_MS.
Startup probe additionally verifies TypeORM DataSource initialization.

All endpoints are public (no auth), return structured JSON with timestamp
and per-component status/responseTime, and respond 503 on failure.

Includes unit tests for controller and service, Swagger documentation,
and Kubernetes probe YAML example in docs/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setup Health Check Endpoints & Readiness Probes

1 participant