From eca30515cd1ff139e915e7e24f86dbdb914fbca6 Mon Sep 17 00:00:00 2001 From: Ikerlaforga <19539979+Isonimus@users.noreply.github.com> Date: Mon, 22 Jun 2026 07:35:57 +0200 Subject: [PATCH] feat(server): Sentry error tracking + alerting (phase 1 observability) The hosted server was instrumented but blind: OTel is wired yet ships disabled (OTEL_SDK_DISABLED=true) and pino logs are ephemeral, so prod errors and outages went unseen. Add @sentry/node (v10) error capture for the server and worker, EU-region + strict PII scrubbing. - instrument.ts: Sentry.init at module level, no-ops without SENTRY_DSN (${SCENT_SECRET_KEY:-}-style env-unset-disabled), preloaded via --import before app modules in the Dockerfile CMD, worker command, worker:start. - scrubPii beforeSend (exported, unit-tested): strips request body, cookies, x-api-key/cookie/authorization headers, query string, client IP, on top of sendDefaultPii:false. - Capture: setupExpressErrorHandler(app) after routes; explicit captureException in the worker failed handlers + flush(2000) on shutdown. - Deploy: SENTRY_DSN/ENVIRONMENT/RELEASE/TRACES_SAMPLE_RATE in .env.example and both compose services; runbook section + /health uptime note. - ADR-0006, ADR index row, CHANGELOG. Errors-only by default (tracesSampleRate 0). OTel traces/metrics/off-box logs to a managed backend remain a deferred phase 2. --- CHANGELOG.md | 1 + deploy/.env.example | 13 ++ deploy/README.md | 28 ++++ deploy/docker-compose.yml | 10 +- docs/adr/0006-observability-sentry.md | 87 ++++++++++++ docs/adr/README.md | 1 + packages/server/Dockerfile | 2 +- packages/server/package.json | 3 +- packages/server/src/app.ts | 7 + packages/server/src/index.ts | 4 + packages/server/src/instrument.test.ts | 62 +++++++++ packages/server/src/instrument.ts | 66 ++++++++++ packages/server/src/worker.ts | 18 +++ pnpm-lock.yaml | 175 +++++++++++++++++++++++++ 14 files changed, 474 insertions(+), 3 deletions(-) create mode 100644 docs/adr/0006-observability-sentry.md create mode 100644 packages/server/src/instrument.test.ts create mode 100644 packages/server/src/instrument.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 618158b..27e1f2c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -23,6 +23,7 @@ the first consumer-visible behaviour change and will drive the next SDK version - **Automation-detector wording**: the anti-tamper flag now reads "Anti-tamper signals" (not "Automation detected") when the combined confidence is weak — e.g. devtools open in a dev environment — so a human isn't labelled a bot. The machine-readable `code` (`automation_suspected`) is unchanged; the reason text also drops the `tamper.` prefix for readability. ### Internal +- **Error tracking (Sentry)** ([ADR-0006](docs/adr/0006-observability-sentry.md)): the hosted server was *instrumented but blind* — OpenTelemetry was wired but ships disabled (`OTEL_SDK_DISABLED=true`) and `pino` logs are ephemeral, so prod errors and outages went unseen. Added `@sentry/node` (v10) error capture + alerting for the server and worker. A new `instrument.ts` runs `Sentry.init` (preloaded via `--import` before app modules) and **no-ops without `SENTRY_DSN`** (`${SCENT_SECRET_KEY:-}`-style "env unset = disabled"), so dev/test/self-host stay inert. Express errors are caught via `setupExpressErrorHandler`; BullMQ job failures via explicit `captureException` in the worker `failed` handlers (+ `flush` on shutdown). Privacy posture for this PII-sensitive product: **EU-region project + strict scrubbing** (`sendDefaultPii: false` plus an exported, unit-tested `scrubPii` `beforeSend` that strips request bodies, cookies, the `x-api-key`/`cookie`/`authorization` headers, query string, and client IP). Errors-only by default (`tracesSampleRate` 0, env-overridable). New env `SENTRY_DSN`/`SENTRY_ENVIRONMENT`/`SENTRY_RELEASE`/`SENTRY_TRACES_SAMPLE_RATE` in the deploy `.env.example`, both compose services, and the runbook (with an external `/health` uptime-monitor note). Distributed traces/metrics/off-box logs to a managed backend remain a deferred phase 2 (the OTel wiring already exists). - **Organizations (multi-tenant) layer** ([ADR-0005](docs/adr/0005-organizations-and-tenancy.md), migrations 013–014): a new `organizations` table is now the tenant boundary above `projects`. The admin `owner` role is **re-scoped from a global superuser to org-scoped** — `canViewProject`/`canManageProject`, the `/admin/*` listing queries, and `requireProjectRead` filter by `organization_id`, and a cross-org project/user id returns `404` (no existence leak). 2FA policy moved from the install-wide `admin_settings.require_2fa` to per-org `organizations.require_2fa`; invites carry the inviter's org so an accepted account joins that company. Migration 013 backfills a single `Default` org for existing installs, so **self-host is unchanged** (one auto-created org); the `NOT NULL` FK backstop lands in migration 014 once every writer is org-aware. `create-admin`/`create-project` take an optional `[orgName]` (shared `findOrCreateOrgByName`). `organization_id` stays off the `/v1` data path (a project key already scopes data) — orgs are an admin/billing concern, the foundational prerequisite for hosted metering/billing. Public self-serve signup is deferred to the billing workstream. - **Docs: GDPR & consent** ([ADR-0004](docs/adr/0004-consent-and-data-lifecycle.md)): new [GDPR & consent integration guide](docs/integrations/gdpr-consent.md) (controller/processor split, CMP wiring per mode, lawful-basis guidance, data-subject rights, DPA stub); OpenAPI updated with the snapshot consent fields, the `LawfulBasis` schema, and the `DELETE`/`export` identity paths. - **Retention sweeper** ([ADR-0004](docs/adr/0004-consent-and-data-lifecycle.md)): a daily BullMQ repeatable job in the worker deletes identities (and, by cascade, their snapshots/drifts/risk/links) whose `last_seen` is older than their project's `retention_days`; projects with a null `retention_days` keep data indefinitely. `sweepRetention()` is idempotent and unit-tested against a real DB. diff --git a/deploy/.env.example b/deploy/.env.example index e780d30..3acd075 100644 --- a/deploy/.env.example +++ b/deploy/.env.example @@ -20,6 +20,19 @@ SCENT_SECRET_KEY=change-me # Leave blank for an API-only deploy (server-to-server traffic sends no Origin). CORS_ALLOWED_ORIGINS= +# Error tracking (Sentry). Unset = disabled (no events leave the box). The DSN is a +# write-only ingest key, not a secret of the cloud-token class — safe to paste here. +# IMPORTANT: create the Sentry project in the EU region (this is a PII-sensitive +# product); the server scrubs request bodies, cookies, and auth headers before sending +# (see deploy/README.md and docs/adr/0006-observability-sentry.md). +SENTRY_DSN= +# Environment tag shown in Sentry. Defaults to NODE_ENV (production here) if unset. +SENTRY_ENVIRONMENT=production +# Release identifier for grouping/regressions — set to the image tag or git SHA. +SENTRY_RELEASE= +# Performance-trace sample rate (0.0–1.0). Default 0 = errors only, no perf traces. +SENTRY_TRACES_SAMPLE_RATE=0 + # Image tag to run. Default "latest"; pin to a commit SHA for reproducible deploys. SCENT_IMAGE_TAG=latest diff --git a/deploy/README.md b/deploy/README.md index d57439f..197610d 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -115,6 +115,34 @@ persistence is enabled here). Combined with the `event_id` dedupe in the worker, that gives at-least-once processing across restarts. For stronger guarantees a Postgres outbox would be the next step. +## Observability: error tracking + uptime + +Out of the box the server emits structured `pino` logs (visible via `docker compose logs +-f scent-server`) but they are ephemeral, and there is no alerting. Two low-effort layers +close that gap. + +**Error tracking (Sentry).** Set `SENTRY_DSN` in `.env` and the server + worker report +unhandled errors (with stack traces and request/job context) to Sentry; leave it unset and +the SDK stays completely inert (no events leave the box). Setup: + +1. Create a Sentry project **in the EU region** (Settings → choose EU when creating the + org/project). This is a PII-sensitive product — keep error data in the EU. +2. Copy the project's DSN into `.env` as `SENTRY_DSN=` and optionally set `SENTRY_RELEASE` + to the image tag/SHA you're running. `docker compose pull && docker compose up -d`. +3. In Sentry, add an **alert rule** (e.g. notify on a new issue / error-rate spike). + +The DSN is a write-only ingest key, **not** a cloud/API token — safe to keep in `.env`. +Before any event is sent the server scrubs request bodies (POST `/v1/events` carries raw +fingerprint signals = PII), cookies, the `x-api-key`/`cookie`/`authorization` headers, the +query string, and the client IP (`sendDefaultPii: false` plus an explicit `beforeSend` — +see [docs/adr/0006-observability-sentry.md](../docs/adr/0006-observability-sentry.md)). +Distributed traces/metrics to a managed backend are a deferred phase 2 (the OTel wiring +already exists but ships disabled via `OTEL_SDK_DISABLED=true`). + +**Uptime.** Sentry can't tell you the box is hard-down. Point an external monitor (Better +Stack / UptimeRobot free tier) at `https:///health` — it returns +`{"status":"ok",...}` — with a 1-minute interval and alerting to the same channel. + ## Optional: GeoIP (impossible-travel detection) The `impossible_transition` risk flag — IP geolocation moving faster than a flight diff --git a/deploy/docker-compose.yml b/deploy/docker-compose.yml index dd4f1b9..14ec5ba 100644 --- a/deploy/docker-compose.yml +++ b/deploy/docker-compose.yml @@ -42,6 +42,10 @@ services: CORS_ALLOWED_ORIGINS: ${CORS_ALLOWED_ORIGINS:-} SCENT_SECRET_KEY: ${SCENT_SECRET_KEY:-} OTEL_SDK_DISABLED: "true" + SENTRY_DSN: ${SENTRY_DSN:-} + SENTRY_ENVIRONMENT: ${SENTRY_ENVIRONMENT:-production} + SENTRY_RELEASE: ${SENTRY_RELEASE:-} + SENTRY_TRACES_SAMPLE_RATE: ${SENTRY_TRACES_SAMPLE_RATE:-0} healthcheck: test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/health',r=>process.exit(r.statusCode===200?0:1)).on('error',()=>process.exit(1))"] interval: 15s @@ -59,13 +63,17 @@ services: restart: unless-stopped # Same image, different entrypoint: drains the BullMQ ingest queue. Scale with # `docker compose up -d --scale scent-worker=N` (no ports/name to collide). - command: ["node", "--import", "./dist/tracing.js", "dist/worker.js"] + command: ["node", "--import", "./dist/instrument.js", "--import", "./dist/tracing.js", "dist/worker.js"] environment: DATABASE_URL: postgresql://scent:${POSTGRES_PASSWORD}@postgres:5432/scent REDIS_URL: redis://redis:6379 NODE_ENV: production WORKER_CONCURRENCY: "5" OTEL_SDK_DISABLED: "true" + SENTRY_DSN: ${SENTRY_DSN:-} + SENTRY_ENVIRONMENT: ${SENTRY_ENVIRONMENT:-production} + SENTRY_RELEASE: ${SENTRY_RELEASE:-} + SENTRY_TRACES_SAMPLE_RATE: ${SENTRY_TRACES_SAMPLE_RATE:-0} depends_on: postgres: condition: service_healthy diff --git a/docs/adr/0006-observability-sentry.md b/docs/adr/0006-observability-sentry.md new file mode 100644 index 0000000..079239d --- /dev/null +++ b/docs/adr/0006-observability-sentry.md @@ -0,0 +1,87 @@ +# ADR-0006: Sentry-led error tracking now; OTel traces/logs to a backend deferred + +**Status:** Accepted +**Date:** 2026-06-21 + +## Context + +The hosted box (`api.scent.tindalabs.dev`) was **instrumented but blind**. The server is +fully wired for OpenTelemetry ([tracing.ts](../../packages/server/src/tracing.ts): NodeSDK + +auto-instrumentations + OTLP exporter) and emits `pino` logs with `trace_id`/`span_id` +correlation — but prod sets `OTEL_SDK_DISABLED=true` on both services +([deploy/docker-compose.yml](../../deploy/docker-compose.yml)), so every trace is dropped and +logs go only to `docker logs` (ephemeral, no search, no alerting). There was **no error +tracking, no alerting, no uptime monitoring**: if prod threw or the box went down, nothing +told us. With live design-partner traffic on the box, that is the gap to close first. + +The OTel wiring is disabled rather than removed deliberately: standing up a managed OTLP +backend (Grafana Cloud / Honeycomb / Dash0), reconciling sampling/cost, and shipping logs +off-box is a larger project than "tell me the moment prod breaks, with a stack trace." + +## Decision + +**Sentry-led.** Add error tracking + alerting via `@sentry/node` now; defer turning on the +existing OTel traces/logs to a managed backend to an explicit **phase 2**. + +Sentry is the fastest path to actionable production errors for a small team: SDK + DSN + +alert rule, stack traces with request/job context, issue grouping and regression detection, +deploy-aware via release tags. Because this is a PII-sensitive fingerprinting product in the +EU under BSL, the posture is **Sentry EU region + strict PII scrubbing**. + +### Phase 1 (this ADR — built) + +- **`@sentry/node` v10** (the current major; sets up its own OpenTelemetry under the hood — + see coexistence note below). Added to the server package. +- **[instrument.ts](../../packages/server/src/instrument.ts)** runs `Sentry.init` at module + level and **no-ops without `SENTRY_DSN`** — mirroring the `${SCENT_SECRET_KEY:-}` + "env unset = feature disabled" convention, so dev, test, and self-host stay completely + inert (every `Sentry.*` call becomes a no-op when init never ran). +- **PII scrubbing**: `sendDefaultPii: false` plus an exported, unit-tested `beforeSend` + (`scrubPii`) that strips the request body (POST `/v1/events` bodies carry raw fingerprint + signals = PII), cookies, the `x-api-key`/`cookie`/`authorization` headers, the query + string, and the client IP. Defense in depth: the explicit strip holds even if a future SDK + default changes. +- **Capture surface**: `Sentry.setupExpressErrorHandler(app)` after all routes + ([app.ts](../../packages/server/src/app.ts)) for sync throws / `next(err)`; the default + global handlers for unhandled rejections; explicit `Sentry.captureException` in the + worker's BullMQ `failed` handlers (BullMQ swallows the throw into the event, so the global + handlers never see it) plus `Sentry.flush(2000)` in worker `shutdown()`. +- **Preload**: `--import ./dist/instrument.js` before `./dist/tracing.js` in the server + Dockerfile `CMD`, the worker compose `command`, and `worker:start`, so Sentry patches + before app modules load. The dev/`tsx` path gets it via a top-of-file import in + index.ts/worker.ts. +- **Errors-only by default**: `tracesSampleRate` defaults to 0 (env-overridable). +- **Uptime**: an external monitor on `/health` (Better Stack / UptimeRobot) catches a + hard-down box Sentry can't — an ops step (runbook), not code. + +The DSN is a write-only ingest key (not a cloud/API token of the class the operator avoids), +so it is fine to keep in `.env`. + +### Phase 2 (deferred — NOT built here) + +Turn on the existing OTel traces/logs to a managed **EU** OTLP backend: set +`OTEL_EXPORTER_OTLP_ENDPOINT`, flip `OTEL_SDK_DISABLED=false`, ship `pino` logs off-box. +Optional source-map upload for readable minified stack traces — needs a `SENTRY_AUTH_TOKEN` +(a token-class CI secret), so it stays gated/deferred. Optional `@sentry/profiling-node`. + +## Key technical nuance: Sentry vs. the app's OTel + +Sentry Node v8+/v10 stands up its **own** OpenTelemetry instance. In phase 1 there is **no +conflict** because the two are mutually exclusive by config: the app's OTel is off in prod +(where Sentry runs), and Sentry is off everywhere the app's OTel is on (dev/self-host with a +local collector). Phase 2 must reconcile them — either `skipOpenTelemetrySetup: true` on the +Sentry init and register Sentry's span processor on the app's `NodeSDK`, or let Sentry own +OTel and export onward. Documented here so the phase-2 implementer doesn't double-initialise. + +## Consequences + +- Prod errors are now visible with stack traces and context, with alerting — the core + operational blind spot is closed. +- Privacy posture is explicit and auditable (EU residency + scrubbing), consistent with + [ADR-0004](0004-consent-and-data-lifecycle.md) (data lifecycle) and the BSL + "Tindalabs-hosted only" model. +- Distributed tracing / metrics / off-box logs remain deferred; the wiring already exists, + so phase 2 is a config + reconciliation task, not a rebuild. + +Relates to [ADR-0003](0003-otel-bridge.md) (the OTel bridge) and +[ADR-0004](0004-consent-and-data-lifecycle.md). diff --git a/docs/adr/README.md b/docs/adr/README.md index 3c874f2..7dd786b 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -9,3 +9,4 @@ Each ADR documents a significant architectural choice: the context, the decision | [0003](0003-otel-bridge.md) | OTel traceparent bridge for blindspot-ux composability | Accepted | | [0004](0004-consent-and-data-lifecycle.md) | Consent is the controller's responsibility; the SDK enforces, never triggers | Accepted | | [0005](0005-organizations-and-tenancy.md) | Organizations are the tenant boundary; owner is org-scoped, not global | Accepted | +| [0006](0006-observability-sentry.md) | Sentry-led error tracking now; OTel traces/logs to a backend deferred | Accepted | diff --git a/packages/server/Dockerfile b/packages/server/Dockerfile index 285d93e..d38788e 100644 --- a/packages/server/Dockerfile +++ b/packages/server/Dockerfile @@ -31,4 +31,4 @@ ENV NODE_ENV=production WORKDIR /app/packages/server EXPOSE 3000 -CMD ["node", "--import", "./dist/tracing.js", "dist/index.js"] +CMD ["node", "--import", "./dist/instrument.js", "--import", "./dist/tracing.js", "dist/index.js"] diff --git a/packages/server/package.json b/packages/server/package.json index f84e175..bd6b0f2 100644 --- a/packages/server/package.json +++ b/packages/server/package.json @@ -11,7 +11,7 @@ "create-project": "tsx src/scripts/create-project.ts", "create-admin": "tsx src/scripts/create-admin.ts", "worker": "tsx src/worker.ts", - "worker:start": "node --import ./dist/tracing.js dist/worker.js", + "worker:start": "node --import ./dist/instrument.js --import ./dist/tracing.js dist/worker.js", "test": "vitest run", "test:coverage": "vitest run --coverage", "type-check": "tsc --noEmit", @@ -24,6 +24,7 @@ "@opentelemetry/resources": "^2.7.1", "@opentelemetry/sdk-node": "^0.218.0", "@opentelemetry/semantic-conventions": "^1.41.1", + "@sentry/node": "^10.59.0", "@tindalabs/scent-engine": "workspace:*", "bcryptjs": "^3.0.3", "bullmq": "^5.78.0", diff --git a/packages/server/src/app.ts b/packages/server/src/app.ts index 6b2f2eb..ff29a51 100644 --- a/packages/server/src/app.ts +++ b/packages/server/src/app.ts @@ -1,4 +1,5 @@ import express, { type Express } from 'express'; +import * as Sentry from '@sentry/node'; import cors from 'cors'; import cookieParser from 'cookie-parser'; import { rateLimitMiddleware, adminRateLimitMiddleware } from './middleware/rate-limit.js'; @@ -85,5 +86,11 @@ export function createApp(): Express { app.use('/v1/account', requireProjectRead, accountRouter); app.use('/v1/accounts', requireProjectRead, accountsRouter); + // Sentry error capture, after all routes. No-op until Sentry.init runs (which only + // happens with SENTRY_DSN set — see instrument.ts), so dev/test/self-host are + // unaffected. Catches sync throws and next(err); the global unhandled-rejection + // integration covers async route rejections that bubble past Express. + Sentry.setupExpressErrorHandler(app); + return app; } diff --git a/packages/server/src/index.ts b/packages/server/src/index.ts index 5075578..86dff10 100644 --- a/packages/server/src/index.ts +++ b/packages/server/src/index.ts @@ -1,3 +1,7 @@ +// Imported first so Sentry.init runs before any instrumented module loads. In the +// built image this is redundant with `--import ./dist/instrument.js` (Dockerfile CMD); +// here it covers the dev/tsx path. No-ops without SENTRY_DSN. +import './instrument.js'; import { startTracing } from './tracing.js'; startTracing(); diff --git a/packages/server/src/instrument.test.ts b/packages/server/src/instrument.test.ts new file mode 100644 index 0000000..9e60f6d --- /dev/null +++ b/packages/server/src/instrument.test.ts @@ -0,0 +1,62 @@ +import { describe, it, expect } from 'vitest'; +import type { ErrorEvent } from '@sentry/node'; +import { scrubPii } from './instrument.js'; + +// scrubPii is the privacy boundary for error reporting: this is a fingerprinting +// product, so a stack trace must never carry a subject's raw signals or an API key. +// Verified directly rather than trusted to integration coverage. +describe('scrubPii', () => { + it('strips the request body, cookies, and query string', () => { + const event = { + request: { + data: { fingerprint: 'raw-device-signals', email: 'user@example.com' }, + cookies: { scent_admin: 'session-token' }, + query_string: 'token=secret', + }, + } as ErrorEvent; + + const out = scrubPii(event); + + expect(out.request?.data).toBeUndefined(); + expect(out.request?.cookies).toBeUndefined(); + expect(out.request?.query_string).toBeUndefined(); + }); + + it('strips sensitive headers case-insensitively but keeps benign ones', () => { + const event = { + request: { + headers: { + 'X-Api-Key': 'pk_live_abc', + Cookie: 'scent_admin=tok', + Authorization: 'Bearer xyz', + 'Content-Type': 'application/json', + 'user-agent': 'curl/8', + }, + }, + } as unknown as ErrorEvent; + + const headers = scrubPii(event).request?.headers ?? {}; + + expect(headers['X-Api-Key']).toBeUndefined(); + expect(headers['Cookie']).toBeUndefined(); + expect(headers['Authorization']).toBeUndefined(); + expect(headers['Content-Type']).toBe('application/json'); + expect(headers['user-agent']).toBe('curl/8'); + }); + + it('strips the client IP from user context', () => { + const event = { + user: { id: 'admin-1', ip_address: '203.0.113.7' }, + } as unknown as ErrorEvent; + + const out = scrubPii(event); + + expect(out.user?.ip_address).toBeUndefined(); + expect(out.user?.id).toBe('admin-1'); // non-PII identifier retained + }); + + it('is a no-op on an event with no request or user', () => { + const event = { message: 'boom' } as ErrorEvent; + expect(scrubPii(event)).toEqual({ message: 'boom' }); + }); +}); diff --git a/packages/server/src/instrument.ts b/packages/server/src/instrument.ts new file mode 100644 index 0000000..2c0d8a1 --- /dev/null +++ b/packages/server/src/instrument.ts @@ -0,0 +1,66 @@ +import * as Sentry from '@sentry/node'; +import type { ErrorEvent } from '@sentry/node'; + +// Headers that can carry credentials or session material. Stripped from every event +// even though sendDefaultPii:false already withholds most of this — defense in depth. +const SENSITIVE_HEADERS = ['x-api-key', 'cookie', 'authorization']; + +// Removes PII from a Sentry event before it leaves the process. This is an +// identity/fingerprinting product: POST /v1/events request bodies carry raw device +// signals that are PII by definition, and headers/cookies carry credentials. We send +// these to an EU-region Sentry project but still scrub aggressively (ADR-0006) so a +// stack trace never ships a subject's fingerprint or an API key. +// +// Exported and unit-tested: this is the privacy boundary, so it is verified directly +// rather than trusted to integration coverage. +export function scrubPii(event: ErrorEvent): ErrorEvent { + if (event.request) { + // Request body (POST /v1/events fingerprint signals) and cookies are always PII here. + delete event.request.data; + delete event.request.cookies; + delete event.request.query_string; + + if (event.request.headers) { + for (const name of Object.keys(event.request.headers)) { + if (SENSITIVE_HEADERS.includes(name.toLowerCase())) { + delete event.request.headers[name]; + } + } + } + } + + if (event.user) { + delete event.user.ip_address; + } + + return event; +} + +// Initialises Sentry error tracking. No-ops without SENTRY_DSN (mirrors the +// `${SCENT_SECRET_KEY:-}` "env unset = feature disabled" convention) so dev, test, and +// self-host deployments stay completely inert — every Sentry.* call becomes a no-op +// when init was never run. +// +// Runs at module level so `node --import ./dist/instrument.js` initialises Sentry +// before any app module (Express, pg, bullmq) loads, letting it patch them. The same +// resolved module is reused by the top-of-file import in index.ts/worker.ts, so the +// dev/tsx path initialises early too. Idempotent: a second import is the same module. +if (process.env['SENTRY_DSN']) { + try { + Sentry.init({ + dsn: process.env['SENTRY_DSN'], + environment: process.env['SENTRY_ENVIRONMENT'] ?? process.env['NODE_ENV'], + release: process.env['SENTRY_RELEASE'], + // Errors-only by default; raise via env to sample performance traces. Phase 2 + // reconciles this with the app's own OTel setup (ADR-0006). + tracesSampleRate: Number(process.env['SENTRY_TRACES_SAMPLE_RATE'] ?? 0), + sendDefaultPii: false, + beforeSend: scrubPii, + }); + } catch (err) { + // The one sanctioned console.* in this module: it loads before pino (so the OTel + // pino instrumentation can patch the logger), and a Sentry init failure must never + // crash boot. Mirrors tracing.ts's shutdown-path console.error. + console.error('[instrument] Sentry init failed:', err); + } +} diff --git a/packages/server/src/worker.ts b/packages/server/src/worker.ts index 0319282..a30c60e 100644 --- a/packages/server/src/worker.ts +++ b/packages/server/src/worker.ts @@ -1,6 +1,11 @@ +// Imported first so Sentry.init runs before any instrumented module loads (same +// rationale as index.ts; redundant with `--import ./dist/instrument.js` in the image, +// covers the dev/tsx path). No-ops without SENTRY_DSN. +import './instrument.js'; import { startTracing } from './tracing.js'; startTracing(); +import * as Sentry from '@sentry/node'; import { Queue, Worker } from 'bullmq'; import { createQueueConnection, INGEST_QUEUE_NAME } from './queue/ingest.js'; import type { IngestJobData } from './queue/ingest.js'; @@ -35,6 +40,12 @@ worker.on('failed', (job, err) => { { jobId: job?.id, attemptsMade: job?.attemptsMade, err }, 'ingest job failed', ); + // BullMQ swallows the throw into this event, so the global handlers never see it — + // capture explicitly. No-op without SENTRY_DSN. + Sentry.captureException(err, { + tags: { queue: INGEST_QUEUE_NAME }, + extra: { jobId: job?.id, attemptsMade: job?.attemptsMade }, + }); }); // Daily retention sweep (GDPR data-lifecycle, ADR-0004). A repeatable job is enqueued @@ -52,6 +63,10 @@ const retentionWorker = new Worker( ); retentionWorker.on('failed', (job, err) => { logger.error({ jobId: job?.id, err }, 'retention sweep failed'); + Sentry.captureException(err, { + tags: { queue: RETENTION_QUEUE_NAME }, + extra: { jobId: job?.id }, + }); }); // Drain in-flight jobs and close the DB pool on shutdown so a redeploy doesn't drop @@ -62,6 +77,9 @@ async function shutdown(): Promise { await retentionWorker.close(); await retentionQueue.close(); await db.end(); + // Flush any buffered events before exit (no-op without SENTRY_DSN). Bounded so a + // Sentry outage can't stall a redeploy. + await Sentry.flush(2000); process.exit(0); } diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 976b617..614f4b1 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -243,6 +243,9 @@ importers: '@opentelemetry/semantic-conventions': specifier: ^1.41.1 version: 1.41.1 + '@sentry/node': + specifier: ^10.59.0 + version: 10.59.0(@opentelemetry/exporter-trace-otlp-http@0.218.0(@opentelemetry/api@1.9.1))(vite@8.0.16(@types/node@25.9.2)(esbuild@0.28.1)(jiti@2.7.0)(tsx@4.22.4)(yaml@2.9.0)) '@tindalabs/scent-engine': specifier: workspace:* version: link:../engine @@ -316,6 +319,17 @@ importers: packages: + '@apm-js-collab/code-transformer-bundler-plugins@0.5.0': + resolution: {integrity: sha512-YxLBY5nGlurL7QeJLq6e5g0ouBpAp0pwgyA/5rHXEXwhiPLn9ZHbT+Y2LlP90GT872cSocfjWRYu/fnpuBudNQ==} + engines: {node: '>=18.0.0'} + + '@apm-js-collab/code-transformer@0.15.0': + resolution: {integrity: sha512-XmXYVs8CzJ1Aj79noVbn2weUO/XWtRyURpGqx7aU7DOXlUQhR0WKOQNF0okh7PCeY37vxf7kU3v57OAkEPm3ww==} + hasBin: true + + '@apm-js-collab/tracing-hooks@0.10.0': + resolution: {integrity: sha512-2/Z3NTewJTruUkmsSnBC5bJlLNUd9keuD1OLlTEpim4FyLhm6m2Rnfv+wrFdUvFfhmH8CRdiDZBqBrn+wyaGuA==} + '@asamuzakjp/css-color@3.2.0': resolution: {integrity: sha512-K1A6z8tS3XsmCMM86xoWdn7Fkdn9m6RSVtocUrJYIwZnFVkng/PvkEoWtOWmP+Scc6saYWHWZYbndEEXxl24jw==} @@ -1219,6 +1233,12 @@ packages: peerDependencies: '@opentelemetry/api': ^1.3.0 + '@opentelemetry/instrumentation@0.214.0': + resolution: {integrity: sha512-MHqEX5Dk59cqVah5LiARMACku7jXSVk9iVDWOea4x3cr7VfdByeDCURK6o1lntT1JS/Tsovw01UJrBhN3/uC5w==} + engines: {node: ^18.19.0 || >=20.6.0} + peerDependencies: + '@opentelemetry/api': ^1.3.0 + '@opentelemetry/instrumentation@0.218.0': resolution: {integrity: sha512-mIZil8Es+sYDK5m+DQiwAwF57F14TF2YlEqvIjZ/RQWcxDBwRGsKfdK2Tv65OU9meQKCMzSIFS9mxAcnAb6Bkg==} engines: {node: ^18.19.0 || >=20.6.0} @@ -1766,6 +1786,56 @@ packages: '@rolldown/pluginutils@1.0.1': resolution: {integrity: sha512-2j9bGt5Jh8hj+vPtgzPtl72j0yRxHAyumoo6TNfAjsLB04UtpSvPbPcDcBMxz7n+9CYB0c1GxQFxYRg2jimqGw==} + '@sentry/conventions@0.12.0': + resolution: {integrity: sha512-z1JQrl/1SLY+8wpzvork6vl+fpsg/oCCxM7HWWhUnI/R+OGNyoIzieQuggX3uUMY7NBtp8UWCQx6FeFazzOF9g==} + engines: {node: '>=14'} + + '@sentry/core@10.59.0': + resolution: {integrity: sha512-QeG7XZL5j6CkToYCE7OwCerb/r742Tjj9p1BBohBKcypYTPRuqfD+A3FeUj7pk5CGO6Vj1/gOAmdbuuNbR51dQ==} + engines: {node: '>=18'} + + '@sentry/node-core@10.59.0': + resolution: {integrity: sha512-qFbepzntYhDleNG9ZCZWCSoAJK0Nsx+UJxsuiygaaAf1rJMj95RVckLyslhY86pyDLVATNMmWm2elm6etgKaJw==} + engines: {node: '>=18'} + peerDependencies: + '@opentelemetry/api': ^1.9.0 + '@opentelemetry/core': ^1.30.1 || ^2.1.0 + '@opentelemetry/exporter-trace-otlp-http': '>=0.57.0 <1' + '@opentelemetry/instrumentation': '>=0.57.1 <1' + '@opentelemetry/sdk-trace-base': ^1.30.1 || ^2.1.0 + peerDependenciesMeta: + '@opentelemetry/api': + optional: true + '@opentelemetry/core': + optional: true + '@opentelemetry/exporter-trace-otlp-http': + optional: true + '@opentelemetry/instrumentation': + optional: true + '@opentelemetry/sdk-trace-base': + optional: true + + '@sentry/node@10.59.0': + resolution: {integrity: sha512-qzqbP6OVoMijlDBUxWtbvVF5j73+vyzGFi+yFIslhVvzBj97TFkIeP3TpBLsmu/0L5ZvxpQCCEmzJ677tFkq/g==} + engines: {node: '>=18'} + + '@sentry/opentelemetry@10.59.0': + resolution: {integrity: sha512-wV9/HR9btrNhSkJC2S0urqsD9pE4K0f6AmdfTK3qhH505mLoyV4ekTG66hdDR9xD2zOYCm58CNzaK+336zu3Gg==} + engines: {node: '>=18'} + peerDependencies: + '@opentelemetry/api': ^1.9.0 + '@opentelemetry/core': ^1.30.1 || ^2.1.0 + '@opentelemetry/sdk-trace-base': ^1.30.1 || ^2.1.0 + + '@sentry/server-utils@10.59.0': + resolution: {integrity: sha512-mR3fWaU7uGxIstRba6YO+/6V3qIa7432F7/U8EWHry+dY4C9DWAVG90E2GCzeD2MwLSP0tB25i8p1TWTGiQgVg==} + engines: {node: '>=18'} + peerDependencies: + vite: ^3.0.0 || ^4.0.0 || ^5.0.0 || ^6.0.0 + peerDependenciesMeta: + vite: + optional: true + '@standard-schema/spec@1.1.0': resolution: {integrity: sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==} @@ -2277,6 +2347,10 @@ packages: ast-v8-to-istanbul@1.0.4: resolution: {integrity: sha512-0bC0/4bTSrnwdhU3IsZDwEdojvuPrSg59OYZfKsLRtJZ0u8VBx9DebfqqG8bRdCC0I7vjgxmPi41P0lpkhJHtA==} + astring@1.9.0: + resolution: {integrity: sha512-LElXdjswlqjWrPpJFg1Fx4wpkOCxj1TDHlSV4PlaRxHGWko024xICaa97ZkMfs6DRKlCguiAI+rbXv5GWwXIkg==} + hasBin: true + asynckit@0.4.0: resolution: {integrity: sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==} @@ -3294,6 +3368,10 @@ packages: resolution: {integrity: sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==} engines: {node: '>= 8'} + meriyah@6.1.4: + resolution: {integrity: sha512-Sz8FzjzI0kN13GK/6MVEsVzMZEPvOhnmmI1lU5+/1cGOiK3QUahntrNNtdVeihrO7t9JpoH75iMNXg6R6uWflQ==} + engines: {node: '>=18.0.0'} + methods@1.1.2: resolution: {integrity: sha512-iclAHeNqNm68zFtnZ0e+1L2yUIdvzNoauKU4WBA3VvH/vPFieF7qfRlwUZU+DA9P9bPXIS90ulxoUoCH23sV2w==} engines: {node: '>= 0.6'} @@ -3865,6 +3943,9 @@ packages: scheduler@0.23.2: resolution: {integrity: sha512-UOShsPwz7NrMUqhR6t0hWjFduvOzbtv7toDH1/hIrfRNIDBnnBWd0CwJTGvTpngVlmwGCdP9/Zl/tVrDqcuYzQ==} + semifies@1.0.0: + resolution: {integrity: sha512-xXR3KGeoxTNWPD4aBvL5NUpMTT7WMANr3EWnaS190QVkY52lqqcVRD7Q05UVbBhiWDGWMlJEUam9m7uFFGVScw==} + semver@6.3.1: resolution: {integrity: sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==} hasBin: true @@ -4448,6 +4529,30 @@ packages: snapshots: + '@apm-js-collab/code-transformer-bundler-plugins@0.5.0': + dependencies: + '@apm-js-collab/code-transformer': 0.15.0 + es-module-lexer: 2.1.0 + magic-string: 0.30.21 + module-details-from-path: 1.0.4 + + '@apm-js-collab/code-transformer@0.15.0': + dependencies: + '@types/estree': 1.0.9 + astring: 1.9.0 + esquery: 1.7.0 + meriyah: 6.1.4 + semifies: 1.0.0 + source-map: 0.6.1 + + '@apm-js-collab/tracing-hooks@0.10.0': + dependencies: + '@apm-js-collab/code-transformer': 0.15.0 + debug: 4.4.3 + module-details-from-path: 1.0.4 + transitivePeerDependencies: + - supports-color + '@asamuzakjp/css-color@3.2.0': dependencies: '@csstools/css-calc': 2.1.4(@csstools/css-parser-algorithms@3.0.5(@csstools/css-tokenizer@3.0.4))(@csstools/css-tokenizer@3.0.4) @@ -5562,6 +5667,15 @@ snapshots: transitivePeerDependencies: - supports-color + '@opentelemetry/instrumentation@0.214.0(@opentelemetry/api@1.9.1)': + dependencies: + '@opentelemetry/api': 1.9.1 + '@opentelemetry/api-logs': 0.214.0 + import-in-the-middle: 3.0.1 + require-in-the-middle: 8.0.1 + transitivePeerDependencies: + - supports-color + '@opentelemetry/instrumentation@0.218.0(@opentelemetry/api@1.9.1)': dependencies: '@opentelemetry/api': 1.9.1 @@ -6154,6 +6268,61 @@ snapshots: '@rolldown/pluginutils@1.0.1': {} + '@sentry/conventions@0.12.0': {} + + '@sentry/core@10.59.0': {} + + '@sentry/node-core@10.59.0(@opentelemetry/api@1.9.1)(@opentelemetry/core@2.7.1(@opentelemetry/api@1.9.1))(@opentelemetry/exporter-trace-otlp-http@0.218.0(@opentelemetry/api@1.9.1))(@opentelemetry/instrumentation@0.214.0(@opentelemetry/api@1.9.1))(@opentelemetry/sdk-trace-base@2.7.1(@opentelemetry/api@1.9.1))': + dependencies: + '@sentry/conventions': 0.12.0 + '@sentry/core': 10.59.0 + '@sentry/opentelemetry': 10.59.0(@opentelemetry/api@1.9.1)(@opentelemetry/core@2.7.1(@opentelemetry/api@1.9.1))(@opentelemetry/sdk-trace-base@2.7.1(@opentelemetry/api@1.9.1)) + import-in-the-middle: 3.0.1 + optionalDependencies: + '@opentelemetry/api': 1.9.1 + '@opentelemetry/core': 2.7.1(@opentelemetry/api@1.9.1) + '@opentelemetry/exporter-trace-otlp-http': 0.218.0(@opentelemetry/api@1.9.1) + '@opentelemetry/instrumentation': 0.214.0(@opentelemetry/api@1.9.1) + '@opentelemetry/sdk-trace-base': 2.7.1(@opentelemetry/api@1.9.1) + + '@sentry/node@10.59.0(@opentelemetry/exporter-trace-otlp-http@0.218.0(@opentelemetry/api@1.9.1))(vite@8.0.16(@types/node@25.9.2)(esbuild@0.28.1)(jiti@2.7.0)(tsx@4.22.4)(yaml@2.9.0))': + dependencies: + '@opentelemetry/api': 1.9.1 + '@opentelemetry/core': 2.7.1(@opentelemetry/api@1.9.1) + '@opentelemetry/instrumentation': 0.214.0(@opentelemetry/api@1.9.1) + '@opentelemetry/sdk-trace-base': 2.7.1(@opentelemetry/api@1.9.1) + '@opentelemetry/semantic-conventions': 1.41.1 + '@sentry/core': 10.59.0 + '@sentry/node-core': 10.59.0(@opentelemetry/api@1.9.1)(@opentelemetry/core@2.7.1(@opentelemetry/api@1.9.1))(@opentelemetry/exporter-trace-otlp-http@0.218.0(@opentelemetry/api@1.9.1))(@opentelemetry/instrumentation@0.214.0(@opentelemetry/api@1.9.1))(@opentelemetry/sdk-trace-base@2.7.1(@opentelemetry/api@1.9.1)) + '@sentry/opentelemetry': 10.59.0(@opentelemetry/api@1.9.1)(@opentelemetry/core@2.7.1(@opentelemetry/api@1.9.1))(@opentelemetry/sdk-trace-base@2.7.1(@opentelemetry/api@1.9.1)) + '@sentry/server-utils': 10.59.0(vite@8.0.16(@types/node@25.9.2)(esbuild@0.28.1)(jiti@2.7.0)(tsx@4.22.4)(yaml@2.9.0)) + import-in-the-middle: 3.0.1 + transitivePeerDependencies: + - '@opentelemetry/exporter-trace-otlp-http' + - supports-color + - vite + + '@sentry/opentelemetry@10.59.0(@opentelemetry/api@1.9.1)(@opentelemetry/core@2.7.1(@opentelemetry/api@1.9.1))(@opentelemetry/sdk-trace-base@2.7.1(@opentelemetry/api@1.9.1))': + dependencies: + '@opentelemetry/api': 1.9.1 + '@opentelemetry/core': 2.7.1(@opentelemetry/api@1.9.1) + '@opentelemetry/sdk-trace-base': 2.7.1(@opentelemetry/api@1.9.1) + '@sentry/conventions': 0.12.0 + '@sentry/core': 10.59.0 + + '@sentry/server-utils@10.59.0(vite@8.0.16(@types/node@25.9.2)(esbuild@0.28.1)(jiti@2.7.0)(tsx@4.22.4)(yaml@2.9.0))': + dependencies: + '@apm-js-collab/code-transformer': 0.15.0 + '@apm-js-collab/code-transformer-bundler-plugins': 0.5.0 + '@apm-js-collab/tracing-hooks': 0.10.0 + '@sentry/conventions': 0.12.0 + '@sentry/core': 10.59.0 + magic-string: 0.30.21 + optionalDependencies: + vite: 8.0.16(@types/node@25.9.2)(esbuild@0.28.1)(jiti@2.7.0)(tsx@4.22.4)(yaml@2.9.0) + transitivePeerDependencies: + - supports-color + '@standard-schema/spec@1.1.0': {} '@standard-schema/utils@0.3.0': {} @@ -6729,6 +6898,8 @@ snapshots: estree-walker: 3.0.3 js-tokens: 10.0.0 + astring@1.9.0: {} + asynckit@0.4.0: {} atomic-sleep@1.0.0: {} @@ -7751,6 +7922,8 @@ snapshots: merge2@1.4.1: {} + meriyah@6.1.4: {} + methods@1.1.2: {} micromatch@4.0.8: @@ -8303,6 +8476,8 @@ snapshots: dependencies: loose-envify: 1.4.0 + semifies@1.0.0: {} + semver@6.3.1: {} semver@7.8.0: {}