A production-grade NestJS microservice architecture featuring 4 services communicating over RabbitMQ, with PostgreSQL per-service schemas, resilience patterns (timeouts, retries with exponential backoff, circuit breakers), idempotency keys, distributed tracing via OpenTelemetry + Jaeger, internationalization, and comprehensive integration tests. Deployed on AWS ECS Fargate with full Terraform IaC, Bitbucket Pipelines CI/CD, K6 load testing, and Grafana Cloud observability.
- System Architecture
- Project Structure
- Technology Stack
- Services Overview
- Request Flow
- API Reference
- RabbitMQ Message Patterns
- Resilience Patterns
- Distributed Tracing
- Database Schema
- Request/Response Pipeline
- Environment Variables
- Getting Started
- Docker Architecture
- Database Migrations
- npm Scripts Reference
- AWS Infrastructure
- CI/CD Pipeline (Bitbucket Pipelines)
- Load Testing (K6)
- Observability (Grafana Cloud + CloudWatch)
- Integration Testing
graph TD
Client["Client / Browser"]
subgraph Application Services
GW["API Gateway<br/>:3000<br/>(HTTP REST + Swagger)"]
OS["Order Service<br/>(RMQ Consumer)"]
IS["Inventory Service<br/>(RMQ Consumer)"]
PS["Payment Service<br/>(RMQ Consumer)"]
end
subgraph Infrastructure
RMQ["RabbitMQ<br/>:5672 / :15672"]
PG["PostgreSQL 16<br/>:5432"]
JG["Jaeger v2<br/>:16686 / :4318"]
PGA["pgAdmin<br/>:5050"]
end
subgraph Database Schemas
OS_DB["order_schema"]
IS_DB["inventory_schema"]
PS_DB["payment_schema"]
end
Client -->|HTTP| GW
GW -->|"RPC (order_queue)"| RMQ
GW -->|"RPC (inventory_queue)"| RMQ
GW -->|"RPC (payment_queue)"| RMQ
RMQ --> OS
RMQ --> IS
RMQ --> PS
OS --> OS_DB
IS --> IS_DB
PS --> PS_DB
OS_DB --> PG
IS_DB --> PG
PS_DB --> PG
PGA -->|Admin| PG
GW -.->|OTLP traces| JG
OS -.->|OTLP traces| JG
IS -.->|OTLP traces| JG
PS -.->|OTLP traces| JG
microservice-r-d/
├── apps/
│ ├── api-gateway/
│ │ └── src/
│ │ ├── main.ts # HTTP app bootstrap + tracing init
│ │ ├── gateway.module.ts # Imports RMQ clients + ResilienceModule
│ │ └── controllers/
│ │ ├── health.controller.ts # GET /health, GET /health/circuits
│ │ ├── order.controller.ts # /api/orders endpoints
│ │ ├── inventory.controller.ts # /api/products endpoints
│ │ └── payment.controller.ts # /api/payments endpoints
│ │
│ ├── order-service/
│ │ └── src/
│ │ ├── main.ts # Microservice bootstrap + tracing
│ │ ├── order.module.ts # TypeORM + I18n config
│ │ ├── data-source.ts # TypeORM DataSource for migrations
│ │ ├── order/
│ │ │ ├── order.entity.ts # Order entity (order_schema)
│ │ │ ├── order.service.ts # CRUD + idempotency logic
│ │ │ └── order.controller.ts # @MessagePattern handlers
│ │ └── migrations/
│ │
│ ├── inventory-service/
│ │ └── src/
│ │ ├── main.ts
│ │ ├── inventory.module.ts
│ │ ├── data-source.ts
│ │ ├── inventory/
│ │ │ ├── product.entity.ts # Product entity (inventory_schema)
│ │ │ ├── stock-reservation.entity.ts # StockReservation entity
│ │ │ ├── inventory.service.ts # CRUD + SKU dedup logic
│ │ │ └── inventory.controller.ts
│ │ └── migrations/
│ │
│ └── payment-service/
│ └── src/
│ ├── main.ts
│ ├── payment.module.ts
│ ├── data-source.ts
│ ├── payment/
│ │ ├── payment.entity.ts # Payment entity (payment_schema)
│ │ ├── payment.service.ts # CRUD + idempotency logic
│ │ └── payment.controller.ts
│ └── migrations/
│
├── libs/
│ └── shared/
│ └── src/
│ ├── index.ts # Barrel exports for @shared
│ ├── config/
│ │ ├── env.config.ts # Environment variable loader
│ │ ├── database.config.ts # getDatabaseConfig(schema)
│ │ └── rmq.config.ts # RMQ client/server config factories
│ ├── constants/
│ │ └── rmq.constants.ts # Service names, queues, message patterns
│ ├── dto/
│ │ ├── create-order.dto.ts # CreateOrderDto + OrderItemDto
│ │ ├── create-product.dto.ts # CreateProductDto
│ │ └── create-payment.dto.ts # CreatePaymentDto
│ ├── filters/
│ │ ├── all-exceptions.filter.ts # Global exception handler
│ │ └── i18n-validation.filter.ts # i18n validation error formatter
│ ├── interceptors/
│ │ └── success-response.interceptor.ts # {success: true, data} wrapper
│ ├── decorators/
│ │ ├── skip-interceptor.decorator.ts # @SkipInterceptor()
│ │ └── api-page-response.decorator.ts # Swagger pagination decorator
│ ├── pagination/
│ │ └── pagination.dto.ts # PageDto<T> + PageMetaDto
│ ├── resilience/
│ │ ├── circuit-breaker.service.ts # Opossum CB wrapper
│ │ ├── resilience.module.ts # Global NestJS module
│ │ └── index.ts
│ ├── tracing/
│ │ ├── tracing.ts # OpenTelemetry SDK init
│ │ └── index.ts
│ ├── utils/
│ │ ├── rpc-error.util.ts # handleRpcResponse() with timeout/retry/CB
│ │ └── rpc-options.ts # RPC_READ_OPTIONS / RPC_WRITE_OPTIONS
│ └── i18n/
│ ├── en/
│ │ ├── translations.json
│ │ └── inventory.json
│ └── de/
│ ├── translations.json
│ └── inventory.json
│
├── test/
│ └── integration/
│ ├── setup.ts # Test app bootstrap
│ ├── order.spec.ts
│ ├── inventory.spec.ts
│ ├── payment.spec.ts
│ └── resilience.spec.ts
│
├── docker-compose.yml # Dev mode (hot-reload)
├── Dockerfile # Production multi-stage build
├── Dockerfile.dev # Dev image (node + nest CLI)
├── init-schemas.sql # Creates 3 PostgreSQL schemas
├── nest-cli.json # NestJS monorepo config
├── package.json
├── tsconfig.json
├── jest.integration.config.ts
├── .env.example
└── .env
| Category | Technology | Version | Purpose |
|---|---|---|---|
| Runtime | Node.js | 22 (Alpine) | JavaScript runtime |
| Language | TypeScript | ^5.1 | Type-safe development |
| Framework | NestJS | ^10.0 | Application framework (monorepo mode) |
| ORM | TypeORM | ^0.3.20 | Database access + migrations |
| Database | PostgreSQL | 16 | Relational data storage |
| Message Broker | RabbitMQ | 3 (management) | Inter-service RPC communication |
| Tracing | OpenTelemetry + Jaeger | SDK 0.212 / Jaeger v2 | Distributed tracing |
| Circuit Breaker | Opossum | ^8.5.0 | Failure protection |
| Validation | class-validator + class-transformer | ^0.14 / ^0.5 | DTO validation |
| API Docs | @nestjs/swagger | ^7.4.0 | Swagger / OpenAPI |
| i18n | nestjs-i18n | ^10.5.1 | Internationalization (en, de) |
| Testing | Jest + Supertest | ^30.2 / ^7.2 | Integration testing |
| Container | Docker Compose | - | Orchestration |
| DB Admin | pgAdmin 4 | latest | PostgreSQL web UI |
- Role: HTTP entry point, routes requests to microservices via RabbitMQ RPC
- Port: 3000 (configurable)
- Database: None (stateless proxy)
- Key features: Swagger docs (
/api/docs), CORS, global validation, response wrapping, circuit breaker stats (/health/circuits)
- Role: Manages order lifecycle
- Schema:
order_schema - Entity:
Order(id, status, items, total, idempotencyKey, timestamps) - Patterns:
create_order,get_order_by_id,get_orders - Idempotency: Client-provided
X-Idempotency-Keyheader stored in unique column
- Role: Manages products and stock reservations
- Schema:
inventory_schema - Entities:
Product(id, name, sku, stock, reservedStock),StockReservation(id, orderId, quantity, status, productId) - Patterns:
create_product,get_product_by_id,get_products - Idempotency: Natural deduplication via unique
skucolumn
- Role: Manages payment processing
- Schema:
payment_schema - Entity:
Payment(id, orderId, amount, status, idempotencyKey, timestamps) - Patterns:
create_payment,get_payment_by_id,get_payments_by_order_id - Idempotency: Client-provided
X-Idempotency-Keyheader stored in unique column
Imported via @shared path alias. Contains: environment config, database/RMQ config factories, DTOs, validation filters, response interceptor, pagination utilities, resilience module (circuit breaker), tracing bootstrap, RPC error handling with timeout/retry, i18n translations, and RMQ constants.
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant RPC as handleRpcResponse()
participant RMQ as RabbitMQ
participant OS as Order Service
participant DB as PostgreSQL
C->>GW: POST /api/orders<br/>{items, total}<br/>X-Idempotency-Key: abc-123
GW->>GW: ValidationPipe validates CreateOrderDto
GW->>RPC: send({cmd: 'create_order'}, {dto, idempotencyKey, lang})
rect rgb(255, 245, 230)
Note over RPC: Resilience Pipeline
RPC->>RPC: timeout(10000ms)
RPC->>RPC: retry(1x, 500ms backoff)
RPC->>RPC: CircuitBreaker.exec('ORDER_SERVICE')
end
RPC->>RMQ: Publish to order_queue
RMQ->>OS: Consume message
OS->>DB: Check idempotencyKey exists?
alt Key exists
DB-->>OS: Return existing order
else New order
OS->>DB: INSERT INTO order_schema.order
DB-->>OS: Return new order
end
OS-->>RMQ: Reply with order data
RMQ-->>RPC: Response
RPC-->>GW: Order object
GW->>GW: SuccessResponseInterceptor wraps response
GW-->>C: 201 {success: true, data: {id, status, items, total, ...}}
| Method | Endpoint | Description | Response |
|---|---|---|---|
| GET | /health |
Health check | {status: "ok"} |
| GET | /health/circuits |
Circuit breaker stats per service | {ORDER_SERVICE: {state, stats}, ...} |
| Method | Endpoint | Description | Headers | Body | Codes |
|---|---|---|---|---|---|
| POST | /api/orders |
Create order | X-Idempotency-Key (opt), X-Lang (opt) |
CreateOrderDto |
201, 400 |
| GET | /api/orders/:id |
Get order by UUID | X-Lang (opt) |
- | 200, 404 |
| GET | /api/orders?page=1&perPage=10 |
List orders (paginated) | - | - | 200 |
Example — Create Order:
// POST /api/orders
// X-Idempotency-Key: order-abc-123
{
"items": [
{ "productId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "quantity": 2, "price": 29.99 }
],
"total": 59.98
}
// Response 201
{
"success": true,
"data": {
"id": "7c69ae16-a49a-4c3b-b507-88e40093a637",
"status": "PENDING",
"items": [{ "productId": "a1b2c3d4-...", "quantity": 2, "price": 29.99 }],
"total": 59.98,
"idempotencyKey": "order-abc-123",
"createdAt": "2026-02-25T12:57:27.664Z",
"updatedAt": "2026-02-25T12:57:27.664Z"
}
}| Method | Endpoint | Description | Headers | Body | Codes |
|---|---|---|---|---|---|
| POST | /api/products |
Create product | - | CreateProductDto |
201, 400 |
| GET | /api/products/:id |
Get product by UUID | X-Lang (opt) |
- | 200, 404 |
| GET | /api/products?page=1&perPage=10 |
List products (paginated) | - | - | 200 |
Example — Create Product:
// POST /api/products
{ "name": "Widget Pro", "sku": "WDG-PRO-001", "stock": 100 }
// Response 201
{
"success": true,
"data": {
"id": "1a0aab4c-49f6-4023-92e4-2545984ee011",
"name": "Widget Pro",
"sku": "WDG-PRO-001",
"stock": 100,
"reservedStock": 0,
"createdAt": "2026-02-25T12:57:34.735Z"
}
}| Method | Endpoint | Description | Headers | Body | Codes |
|---|---|---|---|---|---|
| POST | /api/payments |
Create payment | X-Idempotency-Key (opt), X-Lang (opt) |
CreatePaymentDto |
201, 400 |
| GET | /api/payments/:id |
Get payment by UUID | X-Lang (opt) |
- | 200, 404 |
| GET | /api/payments/order/:orderId |
Get payments by order | - | - | 200 |
Example — Create Payment:
// POST /api/payments
// X-Idempotency-Key: pay-xyz-789
{ "orderId": "7c69ae16-a49a-4c3b-b507-88e40093a637", "amount": 59.98 }
// Response 201
{
"success": true,
"data": {
"id": "b0364d00-6097-4a43-ae78-07f3fb9c5823",
"orderId": "7c69ae16-a49a-4c3b-b507-88e40093a637",
"amount": 59.98,
"status": "PENDING",
"idempotencyKey": "pay-xyz-789",
"createdAt": "2026-02-25T12:57:47.087Z",
"updatedAt": "2026-02-25T12:57:47.087Z"
}
}Swagger UI with interactive API explorer: http://localhost:3000/api/docs
| Queue | Command | Handler Service | Payload |
|---|---|---|---|
order_queue |
create_order |
Order Service | {dto: CreateOrderDto, idempotencyKey?, lang} |
order_queue |
get_order_by_id |
Order Service | {id: string, lang} |
order_queue |
get_orders |
Order Service | {page: number, perPage: number} |
inventory_queue |
create_product |
Inventory Service | {dto: CreateProductDto} |
inventory_queue |
get_product_by_id |
Inventory Service | {id: string, lang} |
inventory_queue |
get_products |
Inventory Service | {page: number, perPage: number} |
payment_queue |
create_payment |
Payment Service | {dto: CreatePaymentDto, idempotencyKey?, lang} |
payment_queue |
get_payment_by_id |
Payment Service | {id: string, lang} |
payment_queue |
get_payments_by_order_id |
Payment Service | {orderId: string} |
All queues are durable and configured via environment variables. Communication uses NestJS's ClientProxy.send() (RPC pattern with reply queue).
Every RPC call from the API Gateway passes through handleRpcResponse() which applies three layers of protection:
graph LR
A["RPC Observable<br/>(ClientProxy.send)"] --> B["timeout()"]
B --> C{"isTransientError?"}
C -->|"Yes (5xx, timeout,<br/>connection error)"| D["retry() with<br/>exponential backoff"]
C -->|"No (4xx business error)"| E["throw immediately"]
D --> F["catchError()<br/>map to HttpException"]
E --> F
F --> G["CircuitBreaker.exec()"]
G --> H["Response or Error"]
Preset configurations (libs/shared/src/utils/rpc-options.ts):
| Preset | Timeout | Max Retries | Initial Delay | Backoff Sequence | Used For |
|---|---|---|---|---|---|
RPC_READ_OPTIONS |
5,000ms | 3 | 300ms | 300ms, 600ms, 1200ms | GET endpoints |
RPC_WRITE_OPTIONS |
10,000ms | 1 | 500ms | 500ms | POST endpoints |
Retry logic: Only transient errors are retried (timeouts, ECONNREFUSED, ECONNRESET, ETIMEDOUT, 5xx). Business errors (4xx) are thrown immediately without retry. Backoff formula: initialDelay * 2^(retryCount - 1).
Each downstream service has its own circuit breaker (managed by CircuitBreakerService using Opossum):
stateDiagram-v2
[*] --> CLOSED
CLOSED --> OPEN : Error rate >= 50%<br/>over 5+ requests<br/>in 10s window
OPEN --> HALF_OPEN : After 30s cooldown
HALF_OPEN --> CLOSED : Probe request succeeds
HALF_OPEN --> OPEN : Probe request fails
CLOSED : All calls pass through
OPEN : All calls fail with 503<br/>"circuit open"
HALF_OPEN : One probe request allowed
Configuration (libs/shared/src/resilience/circuit-breaker.service.ts):
| Parameter | Default | Description |
|---|---|---|
errorThresholdPercentage |
50% | Failure rate to trip the breaker |
volumeThreshold |
5 | Minimum requests before threshold applies |
resetTimeout |
30,000ms | Time before OPEN transitions to HALF_OPEN |
rollingCountTimeout |
10,000ms | Stats rolling window |
Circuit breaker stats are exposed at GET /health/circuits and return the state (CLOSED/OPEN/HALF_OPEN) plus Opossum stats for each service.
| Service | Strategy | Mechanism |
|---|---|---|
| Order | Client-provided key | X-Idempotency-Key header stored in idempotencyKey column (UNIQUE). Duplicate key returns existing order. |
| Payment | Client-provided key | Same pattern as Order. |
| Inventory | Natural dedup | Unique sku column. Creating a product with an existing SKU returns the existing product. |
OpenTelemetry (OTel) auto-instruments the entire request chain. Each service initializes the SDK before NestJS bootstrap via initTracing() in main.ts.
How it works:
graph LR
subgraph "api-gateway"
A["HTTP Span<br/>(HttpInstrumentation)"] --> B["Express Span<br/>(ExpressInstrumentation)"]
B --> C["AMQP Publish Span<br/>(AmqplibInstrumentation)<br/>Injects W3C traceparent"]
end
C -->|"RabbitMQ<br/>(trace context in headers)"| D
subgraph "order-service"
D["AMQP Consume Span<br/>(AmqplibInstrumentation)<br/>Extracts traceparent"] --> E["PG Query Span<br/>(PgInstrumentation)"]
end
E -.->|"OTLP HTTP :4318"| F["Jaeger"]
A -.->|"OTLP HTTP :4318"| F
| Instrumentation | What it captures |
|---|---|
HttpInstrumentation |
Incoming/outgoing HTTP requests |
ExpressInstrumentation |
Express route handling |
PgInstrumentation |
PostgreSQL queries with SQL details |
AmqplibInstrumentation |
RabbitMQ publish/consume + trace context propagation |
Key points:
initTracing()must be called before NestJS imports to monkey-patch libraries correctly- Trace context propagates through RabbitMQ automatically via W3C Traceparent headers in AMQP message properties
BatchSpanProcessorbatches spans for efficient export- Graceful shutdown on SIGTERM/SIGINT flushes pending spans
- Jaeger UI: http://localhost:16686 (System Architecture tab shows service dependency graph)
erDiagram
order_schema_order {
uuid id PK
enum status "PENDING | CONFIRMED | FAILED | CANCELLED"
jsonb items "Array of {productId, quantity, price}"
decimal total "decimal(12,2)"
varchar idempotencyKey UK "nullable"
timestamptz createdAt
timestamptz updatedAt
}
inventory_schema_product {
uuid id PK
varchar name
varchar sku UK
int stock "default 0"
int reservedStock "default 0"
timestamptz createdAt
}
inventory_schema_stock_reservation {
uuid id PK
uuid orderId
int quantity
enum status "RESERVED | RELEASED"
timestamptz createdAt
uuid productId FK
}
payment_schema_payment {
uuid id PK
uuid orderId
decimal amount "decimal(12,2)"
enum status "PENDING | COMPLETED | FAILED | REFUNDED"
varchar idempotencyKey UK "nullable"
timestamptz createdAt
timestamptz updatedAt
}
inventory_schema_product ||--o{ inventory_schema_stock_reservation : "has many"
Each service has its own PostgreSQL schema. There are no cross-schema foreign keys — services are isolated by design. The init-schemas.sql file creates all three schemas on first database startup.
The API Gateway applies a global middleware chain to every request:
graph LR
A["HTTP Request"] --> B["CORS<br/>(origin: *)"]
B --> C["AllExceptionsFilter<br/>(wraps entire chain)"]
C --> D["ValidationPipe<br/>(whitelist + transform)"]
D --> E["Controller<br/>(route handler)"]
E --> F["SuccessResponseInterceptor"]
F --> G["HTTP Response"]
D -->|"Validation fails"| H["400 Bad Request"]
E -->|"Exception thrown"| I["AllExceptionsFilter"]
I --> J["Error Response"]
Response formats:
// Success (SuccessResponseInterceptor)
{ "success": true, "data": { ... } }
// Error (AllExceptionsFilter)
{ "success": false, "data": { "message": "...", "statusCode": 404 } }
// Validation Error
{ "success": false, "data": { "message": ["field must be..."], "error": "Bad Request", "statusCode": 400 } }| Variable | Default | Used By | Description |
|---|---|---|---|
DB_HOST |
localhost |
Backend services | PostgreSQL host |
DB_PORT |
5432 |
Backend services | PostgreSQL port |
DB_NAME |
microservices_learn |
All + postgres container | Database name |
DB_USER |
postgres |
All + postgres container | Database user |
DB_PASSWORD |
postgres |
All + postgres container | Database password |
TYPEORM_SYNCHRONIZE |
true |
Backend services | Auto-sync schema (set false in production!) |
RABBITMQ_URL |
amqp://guest:guest@localhost:5672 |
All services | RabbitMQ connection URL |
RABBITMQ_USER |
guest |
RabbitMQ container | RabbitMQ admin user |
RABBITMQ_PASSWORD |
guest |
RabbitMQ container | RabbitMQ admin password |
API_GATEWAY_PORT |
3000 |
API Gateway | Gateway HTTP port |
ORDER_QUEUE |
order_queue |
Gateway + Order | Order service queue name |
INVENTORY_QUEUE |
inventory_queue |
Gateway + Inventory | Inventory service queue name |
PAYMENT_QUEUE |
payment_queue |
Gateway + Payment | Payment service queue name |
OTEL_EXPORTER_OTLP_ENDPOINT |
http://localhost:4318/v1/traces |
All services | Jaeger OTLP endpoint |
PGADMIN_EMAIL |
admin@baupay.com |
pgAdmin container | pgAdmin login email |
PGADMIN_PASSWORD |
admin |
pgAdmin container | pgAdmin login password |
- Docker and Docker Compose
- Node.js 22+ and npm (for local development)
# 1. Clone and configure
cp .env.example .env
# 2. Start infrastructure (PostgreSQL, RabbitMQ, Jaeger, pgAdmin)
npm run docker:infra
# 3. Install dependencies (for local dev)
npm install
# 4. Run database migrations
npm run migration:run:all
# 5a. Local development (hot-reload, 4 processes)
npm run start:dev
# 5b. OR Docker development (hot-reload inside containers)
npm run docker# Health check
curl http://localhost:3000/health
# {"status":"ok"}
# Create a test order
curl -X POST http://localhost:3000/api/orders \
-H 'Content-Type: application/json' \
-d '{"items":[{"productId":"a1b2c3d4-e5f6-7890-abcd-ef1234567890","quantity":1,"price":10}],"total":10}'| Service | URL | Credentials |
|---|---|---|
| Swagger API Docs | http://localhost:3000/api/docs | - |
| Jaeger Tracing UI | http://localhost:16686 | - |
| RabbitMQ Management | http://localhost:15672 | guest / guest |
| pgAdmin | http://localhost:5050 | From .env |
Uses Dockerfile.dev with volume mounts for hot-reload:
FROM node:22-alpine
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
RUN npm install -g @nestjs/cli
EXPOSE 3000Each service mounts the source code and uses nest start --watch:
# docker-compose.yml (per service)
build:
context: .
dockerfile: Dockerfile.dev
volumes:
- .:/app # Source code mount
- /app/node_modules # Preserve container's node_modules
command: npx nest start <service-name> --watchCode changes on host trigger automatic recompilation and restart inside the container.
Uses multi-stage Dockerfile:
- deps —
npm ci(install dependencies) - build —
npx nest build <service>(compile TypeScript) - production —
node dist/apps/<service>/...(run compiled JS)
Infrastructure services (postgres, rabbitmq, jaeger, pgAdmin) start with docker compose up -d. Application services require --profile app:
npm run docker:infra # Infrastructure only
npm run docker # Infrastructure + all app services (--profile app --build)| Service | Strategy | Interval |
|---|---|---|
| PostgreSQL | pg_isready |
5s |
| RabbitMQ | rabbitmq-diagnostics check_port_connectivity |
10s |
| API Gateway | wget /health |
10s |
| Backend Services | node -e 'process.exit(0)' |
30s |
Services use depends_on with condition: service_healthy to ensure proper startup ordering.
Migrations use TypeORM CLI with per-service data sources:
# Generate a new migration (auto-detects entity changes)
npm run migration:generate:order
npm run migration:generate:inventory
npm run migration:generate:payment
# Run pending migrations
npm run migration:run:all # All services
npm run migration:run:order # Single service
# Revert last migration
npm run migration:revert:all # All services
npm run migration:revert:order # Single service
init-schemas.sqlcreates the three PostgreSQL schemas (order_schema,inventory_schema,payment_schema) on first database startup via Docker'sdocker-entrypoint-initdb.dmechanism.
| Script | Description |
|---|---|
build |
Build all 4 services |
build:gateway |
Build API Gateway only |
build:order |
Build Order Service only |
build:inventory |
Build Inventory Service only |
build:payment |
Build Payment Service only |
| Script | Description |
|---|---|
start:dev |
Start all 4 services with hot-reload (concurrently) |
start:dev:gateway |
Start API Gateway with --watch |
start:dev:order |
Start Order Service with --watch |
start:dev:inventory |
Start Inventory Service with --watch |
start:dev:payment |
Start Payment Service with --watch |
| Script | Description |
|---|---|
start:prod:gateway |
Run compiled API Gateway |
start:prod:order |
Run compiled Order Service |
start:prod:inventory |
Run compiled Inventory Service |
start:prod:payment |
Run compiled Payment Service |
| Script | Description |
|---|---|
docker:infra |
Start infrastructure (postgres, rabbitmq, jaeger, pgadmin) |
docker:infra:down |
Stop infrastructure |
docker |
Build and start everything (infra + app services) |
docker:down |
Stop everything |
docker:logs |
Tail logs for all services |
docker:sh:* |
Shell into a specific container |
| Script | Description |
|---|---|
migration:generate:{service} |
Generate migration for a service |
migration:run:{service} |
Run migrations for a service |
migration:run:all |
Run all migrations |
migration:revert:{service} |
Revert last migration for a service |
migration:revert:all |
Revert all migrations |
| Script | Description |
|---|---|
test:integration |
Run integration tests (--runInBand) |
All infrastructure is defined as code using Terraform with an S3 + DynamoDB state backend.
graph TD
Internet["Internet"]
subgraph "AWS — eu-north-1"
ALB["Application Load Balancer<br/>rnd-ms-alb<br/>:80 → Gateway<br/>:15672 → RabbitMQ UI"]
subgraph "ECS Fargate Cluster (rnd-ms-cluster)"
GW_ECS["API Gateway<br/>256 CPU / 512 MB"]
OS_ECS["Order Service<br/>256 CPU / 512 MB"]
IS_ECS["Inventory Service<br/>256 CPU / 512 MB"]
PS_ECS["Payment Service<br/>256 CPU / 512 MB"]
RMQ_ECS["RabbitMQ<br/>256 CPU / 512 MB<br/>(EFS persistence)"]
MIG_ECS["Migration Task<br/>(one-off)"]
K6_ECS["K6 Load Test<br/>(one-off)"]
end
subgraph "Data Layer"
RDS["RDS PostgreSQL 16<br/>db.t3.micro<br/>(private subnets)"]
EFS["EFS<br/>RabbitMQ data"]
SSM["SSM Parameter Store<br/>DB creds, RMQ URL"]
end
subgraph "Networking"
VPC["VPC 10.0.0.0/16"]
PUB_A["Public Subnet A<br/>10.0.1.0/24"]
PUB_B["Public Subnet B<br/>10.0.2.0/24"]
PRIV_A["Private Subnet A<br/>10.0.10.0/24"]
PRIV_B["Private Subnet B<br/>10.0.11.0/24"]
end
subgraph "Observability"
CW["CloudWatch Logs<br/>(7-day retention)"]
GF["Grafana Cloud<br/>(empattech.grafana.net)"]
end
SD["Service Discovery<br/>rabbitmq.local:5672"]
end
Internet --> ALB
ALB --> GW_ECS
GW_ECS --> RMQ_ECS
RMQ_ECS --> OS_ECS
RMQ_ECS --> IS_ECS
RMQ_ECS --> PS_ECS
OS_ECS --> RDS
IS_ECS --> RDS
PS_ECS --> RDS
MIG_ECS --> RDS
RMQ_ECS --> EFS
RMQ_ECS --> SD
GW_ECS --> CW
OS_ECS --> CW
IS_ECS --> CW
PS_ECS --> CW
K6_ECS --> CW
CW --> GF
| Component | Resource | Details |
|---|---|---|
| Compute | ECS Fargate | Cluster with 4 services + RabbitMQ + migration task + K6 task |
| Networking | VPC | 2 public subnets (ECS) + 2 private subnets (RDS), IGW, route tables |
| Load Balancing | ALB | Gateway target group (:3000), RabbitMQ management (:15672) |
| Database | RDS PostgreSQL 16 | db.t3.micro, encrypted (gp3), 7-day backups, private subnets |
| Storage | EFS | RabbitMQ data persistence with IAM auth + access points |
| Secrets | SSM Parameter Store | DB credentials, RabbitMQ URL, Grafana keys (SecureString) |
| DNS | Service Discovery | Private namespace (*.local) for internal RabbitMQ routing |
| Logging | CloudWatch | Log group per service (/ecs/rnd-ms-*), 7-day retention |
| Security | 4 Security Groups | ALB (HTTP), ECS tasks (3000, 5672, 15672), EFS (NFS 2049), RDS (5432) |
| IAM | 3 Roles + 1 User | ECS execution, ECS task, Grafana CloudWatch reader |
| Registry | ECR | 6 repositories (gateway, order, inventory, payment, migrations, k6) |
infra/
├── backend.tf # S3 state backend
├── providers.tf # AWS provider
├── variables.tf # All configurable variables
├── vpc.tf # VPC, subnets, IGW, route tables
├── security-groups.tf # ALB, ECS, EFS, RDS security groups
├── alb.tf # Load balancer, target groups, listeners
├── ecs.tf # ECS cluster + RabbitMQ task/service
├── ecs-services.tf # Gateway, Order, Inventory, Payment tasks/services
├── ecs-migrations.tf # Migration ECR repo + task definition
├── ecs-k6.tf # K6 ECR repo + task definition
├── rds.tf # PostgreSQL RDS instance + subnet group
├── efs.tf # EFS file system + mount targets + access point
├── iam.tf # Execution role, task role, Grafana IAM user
├── ssm.tf # SSM parameters (DB, RabbitMQ, Grafana)
├── service-discovery.tf # Private DNS namespace + RabbitMQ service
├── cloudwatch.tf # Log groups for all services
└── outputs.tf # Gateway URL, RDS endpoint, ECR repos, etc.
Fully automated deployment pipeline with smart change detection — only builds, migrates, and deploys services that were modified.
graph LR
subgraph "PR Pipeline"
PR_I["Install"] --> PR_P["Parallel"]
PR_P --> PR_L["Lint + Type Check"]
PR_P --> PR_T["Integration Tests"]
PR_P --> PR_D["Detect Changes"]
end
subgraph "Main Branch Pipeline"
I["Install"] --> P["Parallel"]
P --> L["Lint + Type Check"]
P --> D["Detect Changes"]
D --> M["Run Migrations<br/>(ECS Task)"]
M --> B["Build & Push<br/>(Docker → ECR)"]
B --> DEP["Deploy to ECS<br/>(wait for stable)"]
DEP --> S["Smoke Test<br/>(health + circuits)"]
S --> P2["Parallel"]
P2 --> R["Rollback<br/>(manual)"]
P2 --> K6B["K6 Build & Push"]
K6B --> K6R["K6 Load Test<br/>(manual)"]
end
| Feature | Description |
|---|---|
| Change detection | Compares git diff against origin/main, detects changes in apps/, libs/, package* |
| Per-service migrations | Only runs TypeORM migrations for services with actual code changes |
| ECS migration task | Migrations execute as a one-off Fargate task, not at service boot time |
| Deployment stability | Waits for services-stable after each ECS update with event logging on failure |
| Smoke testing | Retries health check up to 10 times (10s interval) with circuit breaker validation |
| Manual rollback | Reverts all services to previous task definition revision |
| K6 integration | Builds K6 image to ECR, runs load test as ECS task (manual trigger) |
| Custom pipeline | k6-load-test can be triggered from Bitbucket UI with configurable profile |
| Step timeouts | max-time on all steps prevents infinite hangs |
Migrations run as a dedicated Fargate task using Dockerfile.migrations:
Dockerfile.migrations → builds all 3 services → runs scripts/run-migrations.js [service-list]
The pipeline determines which services need migrations based on git diff, then:
- Builds and pushes the migration Docker image to ECR
- Runs
aws ecs run-taskwith the service names as command override - Waits for task completion and checks exit code
- Streams migration logs from CloudWatch
K6 load tests run as ECS Fargate tasks, hitting the production ALB. Results stream to CloudWatch Logs and are viewable in Grafana.
| Profile | Duration | Max VUs | Stages | Purpose |
|---|---|---|---|---|
| Smoke | 2 min | 2 | 30s ramp → 1m steady → 30s down | Post-deploy sanity check |
| Load | 16 min | 50 | Ramp to 20 → hold → ramp to 50 → hold → down | Steady-state baseline |
| Stress | 22 min | 200 | 50 → 100 → 200 → hold → down | Find breaking points |
| Spike | 5 min | 300 | 10 → 300 burst → hold → recover → down | Sudden traffic resilience |
GET /health— health check + circuit breaker statusGET /api/products— inventory listing (measures inventory latency)POST /api/orders— order creation with random items and idempotency keys (measures order latency)GET /api/orders/:id— order retrievalGET /api/payments— payment listing (measures payment latency)
| Metric | Type | Description |
|---|---|---|
errors |
Rate | Percentage of failed requests |
order_latency |
Trend | Order creation response time |
inventory_latency |
Trend | Product listing response time |
payment_latency |
Trend | Payment listing response time |
orders_created |
Counter | Total successful orders |
| Metric | Value |
|---|---|
| p95 Latency | 9.8 ms |
| Average Latency | 6.7 ms |
| Total Requests | 512 |
| Iterations | 128 |
| Max VUs | 2 |
# Via helper script (runs as ECS task)
bash scripts/run-k6.sh smoke # Quick sanity check
bash scripts/run-k6.sh load # Normal load baseline
bash scripts/run-k6.sh stress # Stress test
bash scripts/run-k6.sh spike # Spike test
# Via Bitbucket custom pipeline
# Pipelines → Run pipeline → k6-load-test → set K6_PROFILE variablek6/
├── Dockerfile # grafana/k6 base image + test script
└── load-test.js # Test scenarios, profiles, custom metrics, thresholds
Grafana Cloud (
empattech.grafana.net) connected to AWS CloudWatch for centralized monitoring of all services, load tests, and infrastructure metrics.
graph LR
subgraph "ECS Services"
GW["Gateway"]
OS["Order"]
IS["Inventory"]
PS["Payment"]
K6["K6 Load Test"]
end
subgraph "AWS"
CW_L["CloudWatch Logs<br/>/ecs/rnd-ms-*"]
CW_M["CloudWatch Metrics<br/>ECS, RDS, ALB"]
end
subgraph "Grafana Cloud"
GF_E["Explore<br/>(Logs Insights)"]
GF_D["Dashboards<br/>(Metrics)"]
end
GW --> CW_L
OS --> CW_L
IS --> CW_L
PS --> CW_L
K6 --> CW_L
CW_L --> GF_E
CW_M --> GF_D
| Source | Log Group / Namespace | What You See |
|---|---|---|
| API Gateway | /ecs/rnd-ms-gateway |
HTTP requests, RPC calls, errors |
| Order Service | /ecs/rnd-ms-order |
Order processing, DB queries |
| Inventory Service | /ecs/rnd-ms-inventory |
Product operations, stock management |
| Payment Service | /ecs/rnd-ms-payment |
Payment processing |
| RabbitMQ | /ecs/rnd-ms-rabbitmq |
Broker health, connections |
| Migrations | /ecs/rnd-ms-migrations |
Migration execution logs |
| K6 Load Tests | /ecs/rnd-ms-k6 |
Test progress, JSON summary with metrics |
| ECS Metrics | AWS/ECS |
CPU, memory utilization per service |
| RDS Metrics | AWS/RDS |
Connections, IOPS, latency, storage |
| ALB Metrics | AWS/ApplicationELB |
Request count, error rates, latency |
- Grafana:
https://empattech.grafana.net→ Explore → selectcloudwatchdata source - Authentication: Dedicated IAM user (
grafana-cloudwatch-reader) withCloudWatchReadOnlyAccess+CloudWatchLogsReadOnlyAccess - Log queries: Use CloudWatch Logs Insights QL in Grafana Explore
Integration tests exercise the full stack: HTTP request through the API Gateway, RabbitMQ message delivery, microservice processing, and PostgreSQL queries.
npm run docker:infra # Infrastructure must be running
npm run migration:run:all # Tables must existnpm run test:integration| File | Tests | Description |
|---|---|---|
order.spec.ts |
6 | Create order, idempotency (same key = same ID), validation (400), get by ID, 404 for missing, pagination |
inventory.spec.ts |
6 | Create product, SKU deduplication, validation (400), get by ID, 404 for missing, pagination |
payment.spec.ts |
7 | Create payment, idempotency, no-key creates different payments, validation (400), get by ID, 404 for missing, get by orderId |
resilience.spec.ts |
5 | Health endpoint, circuit breaker stats, CLOSED state for healthy services, timeout handling, no retry on 404 |
test/integration/setup.ts bootstraps a real NestApplication from GatewayModule with the same global pipes, filters, and interceptors as production. OpenTelemetry tracing is initialized so test traces appear in Jaeger.
- Jest config:
jest.integration.config.ts - Timeout: 30 seconds per test
- Runs in band (
--runInBand) to avoid parallel DB conflicts - Module path alias
@sharedmapped tolibs/shared/src/




