Skip to content

feat(server): Sentry error tracking + alerting (phase 1 observability)#73

Merged
Isonimus merged 1 commit into
mainfrom
feat/sentry-observability
Jun 22, 2026
Merged

feat(server): Sentry error tracking + alerting (phase 1 observability)#73
Isonimus merged 1 commit into
mainfrom
feat/sentry-observability

Conversation

@Isonimus

Copy link
Copy Markdown
Contributor

What & why

The hosted box (api.scent.tindalabs.dev) was instrumented but blind: OpenTelemetry is fully wired (tracing.ts) but ships disabled in prod (OTEL_SDK_DISABLED=true on both services), and pino logs go only to docker logs (ephemeral). There was no error tracking, no alerting, no uptime monitoring — if prod threw or the box went down, nothing told us.

This adds Sentry-led error tracking + alerting (phase 1). Turning on the existing OTel traces/logs to a managed backend is an explicit, deferred phase 2. Decision recorded in ADR-0006.

Because this is a PII-sensitive fingerprinting product in the EU under BSL: Sentry EU region + strict PII scrubbing.

What's in it

  • @sentry/node v10 (current major — the plan said v9, but v9 is no longer current; same API surface).
  • instrument.tsSentry.init at module level, no-ops without SENTRY_DSN (mirrors the ${SCENT_SECRET_KEY:-} "env unset = disabled" convention), so dev/test/self-host stay completely inert. Preloaded via --import ./dist/instrument.js before tracing.js in the Dockerfile CMD, the worker compose command, and worker:start; the dev/tsx path gets it via a top-of-file import in index.ts/worker.ts.
  • scrubPii beforeSend (exported, unit-tested) — on top of sendDefaultPii: false, strips the request body (POST /v1/events carries raw fingerprint signals = PII), cookies, the x-api-key/cookie/authorization headers, query string, and client IP.
  • Capture surfaceSentry.setupExpressErrorHandler(app) after all routes; explicit Sentry.captureException in the worker's BullMQ failed handlers (BullMQ swallows the throw into the event) + Sentry.flush(2000) on shutdown.
  • Errors-only by default (tracesSampleRate 0, env-overridable).
  • DeploySENTRY_DSN/SENTRY_ENVIRONMENT/SENTRY_RELEASE/SENTRY_TRACES_SAMPLE_RATE in .env.example and both compose services; runbook section + an external /health uptime-monitor note.

Verification

  • type-check, lint (0 errors), 139 server tests pass (135 existing + 4 new scrubPii), pnpm audit --audit-level=high clean, image build emits dist/instrument.js with the DSN guard.
  • Inert without a DSN: the whole suite runs with Sentry uninitialised, every Sentry.* a no-op.

Manual step (operator-owned, does NOT block merge)

Create the EU-region Sentry project and paste its DSN into prod .env, then pull && up -d. The DSN is a write-only ingest key (not a cloud/API token). Until then Sentry stays off — merging changes nothing in prod behaviour.

Key nuance for phase 2

Sentry Node v8+/v10 sets up its own OpenTelemetry. No conflict in phase 1 (the app's OTel is off wherever Sentry runs, and vice-versa — mutually exclusive by config). Phase 2 must reconcile them (skipOpenTelemetrySetup: true + register Sentry's span processor on the app's NodeSDK, or let Sentry own OTel). Documented in ADR-0006.

The hosted server was instrumented but blind: OTel is wired yet ships
disabled (OTEL_SDK_DISABLED=true) and pino logs are ephemeral, so prod
errors and outages went unseen. Add @sentry/node (v10) error capture for
the server and worker, EU-region + strict PII scrubbing.

- instrument.ts: Sentry.init at module level, no-ops without SENTRY_DSN
  (${SCENT_SECRET_KEY:-}-style env-unset-disabled), preloaded via --import
  before app modules in the Dockerfile CMD, worker command, worker:start.
- scrubPii beforeSend (exported, unit-tested): strips request body,
  cookies, x-api-key/cookie/authorization headers, query string, client IP,
  on top of sendDefaultPii:false.
- Capture: setupExpressErrorHandler(app) after routes; explicit
  captureException in the worker failed handlers + flush(2000) on shutdown.
- Deploy: SENTRY_DSN/ENVIRONMENT/RELEASE/TRACES_SAMPLE_RATE in .env.example
  and both compose services; runbook section + /health uptime note.
- ADR-0006, ADR index row, CHANGELOG.

Errors-only by default (tracesSampleRate 0). OTel traces/metrics/off-box
logs to a managed backend remain a deferred phase 2.
@Isonimus Isonimus merged commit 4cc3ea9 into main Jun 22, 2026
4 checks passed
@Isonimus Isonimus deleted the feat/sentry-observability branch June 22, 2026 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant