feat(server): Sentry error tracking + alerting (phase 1 observability)#73
Merged
Conversation
The hosted server was instrumented but blind: OTel is wired yet ships
disabled (OTEL_SDK_DISABLED=true) and pino logs are ephemeral, so prod
errors and outages went unseen. Add @sentry/node (v10) error capture for
the server and worker, EU-region + strict PII scrubbing.
- instrument.ts: Sentry.init at module level, no-ops without SENTRY_DSN
(${SCENT_SECRET_KEY:-}-style env-unset-disabled), preloaded via --import
before app modules in the Dockerfile CMD, worker command, worker:start.
- scrubPii beforeSend (exported, unit-tested): strips request body,
cookies, x-api-key/cookie/authorization headers, query string, client IP,
on top of sendDefaultPii:false.
- Capture: setupExpressErrorHandler(app) after routes; explicit
captureException in the worker failed handlers + flush(2000) on shutdown.
- Deploy: SENTRY_DSN/ENVIRONMENT/RELEASE/TRACES_SAMPLE_RATE in .env.example
and both compose services; runbook section + /health uptime note.
- ADR-0006, ADR index row, CHANGELOG.
Errors-only by default (tracesSampleRate 0). OTel traces/metrics/off-box
logs to a managed backend remain a deferred phase 2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
The hosted box (
api.scent.tindalabs.dev) was instrumented but blind: OpenTelemetry is fully wired (tracing.ts) but ships disabled in prod (OTEL_SDK_DISABLED=trueon both services), andpinologs go only todocker logs(ephemeral). There was no error tracking, no alerting, no uptime monitoring — if prod threw or the box went down, nothing told us.This adds Sentry-led error tracking + alerting (phase 1). Turning on the existing OTel traces/logs to a managed backend is an explicit, deferred phase 2. Decision recorded in ADR-0006.
Because this is a PII-sensitive fingerprinting product in the EU under BSL: Sentry EU region + strict PII scrubbing.
What's in it
@sentry/nodev10 (current major — the plan said v9, but v9 is no longer current; same API surface).instrument.ts—Sentry.initat module level, no-ops withoutSENTRY_DSN(mirrors the${SCENT_SECRET_KEY:-}"env unset = disabled" convention), so dev/test/self-host stay completely inert. Preloaded via--import ./dist/instrument.jsbeforetracing.jsin the DockerfileCMD, the worker composecommand, andworker:start; the dev/tsx path gets it via a top-of-file import in index.ts/worker.ts.scrubPiibeforeSend(exported, unit-tested) — on top ofsendDefaultPii: false, strips the request body (POST/v1/eventscarries raw fingerprint signals = PII), cookies, thex-api-key/cookie/authorizationheaders, query string, and client IP.Sentry.setupExpressErrorHandler(app)after all routes; explicitSentry.captureExceptionin the worker's BullMQfailedhandlers (BullMQ swallows the throw into the event) +Sentry.flush(2000)on shutdown.tracesSampleRate0, env-overridable).SENTRY_DSN/SENTRY_ENVIRONMENT/SENTRY_RELEASE/SENTRY_TRACES_SAMPLE_RATEin .env.example and both compose services; runbook section + an external/healthuptime-monitor note.Verification
type-check,lint(0 errors), 139 server tests pass (135 existing + 4 newscrubPii),pnpm audit --audit-level=highclean, image build emitsdist/instrument.jswith the DSN guard.Sentry.*a no-op.Manual step (operator-owned, does NOT block merge)
Create the EU-region Sentry project and paste its DSN into prod
.env, thenpull && up -d. The DSN is a write-only ingest key (not a cloud/API token). Until then Sentry stays off — merging changes nothing in prod behaviour.Key nuance for phase 2
Sentry Node v8+/v10 sets up its own OpenTelemetry. No conflict in phase 1 (the app's OTel is off wherever Sentry runs, and vice-versa — mutually exclusive by config). Phase 2 must reconcile them (
skipOpenTelemetrySetup: true+ register Sentry's span processor on the app's NodeSDK, or let Sentry own OTel). Documented in ADR-0006.