feat(health): implement liveness, readiness, and startup probes by Vivian-04 · Pull Request #38 · SourceXXL/alian_structure-api

Vivian-04 · 2026-06-19T11:29:30Z

PR #22 Setup Health Check Endpoints & Readiness Probes

feat(health): Setup Health Check Endpoints & Readiness Probes

Summary

Implements /health/live, /health/ready, and /health/startup endpoints for Kubernetes liveness, readiness, and startup probes. Checks PostgreSQL and Redis connectivity with configurable timeouts, returns structured JSON with per-component status and response time, and is fully documented in Swagger.

Closes #22

What changed

New module — `src/health/`

File	Role
`health.module.ts`	NestJS module; wires up a lazy ioredis client via factory provider
`health.service.ts`	Core logic — `SELECT 1` DB ping, Redis `PING`, startup completeness check
`health.controller.ts`	Three endpoints with `@Public()` + `@SkipKyc()` (no auth required)
`dto/health-response.dto.ts`	Typed response shape with full Swagger decoration
`health.constants.ts`	Shared DI token (`HEALTH_REDIS_CLIENT`) to avoid circular imports
`health.controller.spec.ts`	12 controller unit tests
`health.service.spec.ts`	16 service unit tests covering ok / degraded / error / timeout paths

Modified files

src/app.module.ts — registers HealthModule
src/config/swagger.config.ts — adds Health tag with description
src/config/env.validation.ts — adds optional REDIS_URL and HEALTH_CHECK_TIMEOUT_MS

New docs

docs/kubernetes-health-probes.md — full Kubernetes probe YAML, env var reference, and design rationale

Endpoints

All endpoints live under the global prefix GET /api/v1/:

Path	K8s probe	Success	Failure
`health/live`	Liveness	200 — process alive	never fails
`health/ready`	Readiness	200 — DB + Redis up	503 — both down
`health/startup`	Startup	200 — DB + ORM initialized	503 — not ready

Response format

{
  "status": "ok",
  "timestamp": "2024-01-01T00:00:00.000Z",
  "uptime": 123.456,
  "components": {
    "database": { "status": "up", "responseTime": 4 },
    "redis":    { "status": "up", "responseTime": 1 }
  }
}

status values: ok (all up) · degraded (some up, readiness only) · error — HTTP 503

Design decisions

Liveness never checks dependencies.
A database outage should pull the pod from rotation (readiness), not restart the container (liveness). Restarting does not fix a database.

degraded state on readiness.
When only one dependency is down, returning degraded (HTTP 200) keeps the pod in rotation rather than immediately removing it. This prevents a Redis blip from taking all pods offline simultaneously.

Promise.race + setTimeout for timeouts.
Each component check races against a configurable deadline (HEALTH_CHECK_TIMEOUT_MS, default 5 s). A hung database connection will never block a probe response indefinitely.

@Res({ passthrough: true }) for 503.
Throwing an HttpException would route through GlobalExceptionFilter, wrapping the body in { statusCode, correlationId, message, path }. Health probes need a clean, predictable body regardless of HTTP status, so we set the status code on the response directly and return the DTO normally.

No circular imports.
HEALTH_REDIS_CLIENT lives in health.constants.ts so neither health.service.ts nor health.module.ts imports the other.

Acceptance criteria

Criterion	Status
`/health/live` endpoint — basic health (HTTP 200)	Done
`/health/ready` endpoint — database, cache, external services	Done
`/health/startup` endpoint — startup completeness check	Done
Database connection test (< 100 ms response)	Done — `SELECT 1` with timeout
Cache (Redis) connectivity test	Done — `PING` with timeout
Response format includes timestamp and component status	Done
Configurable timeout values	Done — `HEALTH_CHECK_TIMEOUT_MS` env var
Health check responses documented in Swagger	Done
Kubernetes probe configuration example in docs	Done — `docs/kubernetes-health-probes.md`
Unit tests for each health endpoint	Done — 28 tests total
Failure scenario tests	Done — db down, redis down, both down, timeout

Test plan

npm test -- --testPathPatterns=src/health — all 28 unit tests pass
Start app with live DB — GET /api/v1/health/live returns { "status": "ok" }
Start app with live DB + Redis — GET /api/v1/health/ready returns { "status": "ok" }
Redis unreachable — GET /api/v1/health/ready returns HTTP 200, { "status": "degraded" }
Both DB + Redis down — GET /api/v1/health/ready returns HTTP 503, { "status": "error" }
Endpoints appear in Swagger UI at /api/docs under the Health tag
Kubernetes probe YAML from docs/kubernetes-health-probes.md applies cleanly to a cluster

Closes #22

Add /health/live, /health/ready, and /health/startup endpoints for Kubernetes container orchestration. Readiness checks PostgreSQL (SELECT 1) and Redis (PING) with configurable timeout via HEALTH_CHECK_TIMEOUT_MS. Startup probe additionally verifies TypeORM DataSource initialization. All endpoints are public (no auth), return structured JSON with timestamp and per-component status/responseTime, and respond 503 on failure. Includes unit tests for controller and service, Swagger documentation, and Kubernetes probe YAML example in docs/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(health): implement liveness, readiness, and startup probes#38

feat(health): implement liveness, readiness, and startup probes#38
Vivian-04 wants to merge 1 commit into
SourceXXL:mainfrom
Vivian-04:feat/health-check-endpoints

Vivian-04 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Vivian-04 commented Jun 19, 2026

feat(health): Setup Health Check Endpoints & Readiness Probes

Summary

What changed

New module — src/health/

Modified files

New docs

Endpoints

Response format

Design decisions

Acceptance criteria

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New module — `src/health/`