Breach Scanning

The scan engine checks every employee email against every configured breach-intelligence provider, persists matches, and fires notifications. It lives in src/lib/scan/.

Providers

A provider implements one contract (src/lib/scan/types.ts):

interface BreachProvider {
  id: ApiProvider          // HIBP | HIBP_STEALER | DEHASHED | LEAKCHECK | INTELX | SNUSBASE
  source: BreachSource     // HIBP | MANUAL | DARK_WEB | STEALER_LOG
  lookup(email: string, apiKey: string): Promise<Finding[]>
}

A Finding is the normalized result shared across providers:

interface Finding {
  name: string             // breach identifier, used as the Breach unique key
  breachDate: Date         // epoch (1970) when the provider does not expose it
  dataTypes: string[]      // exposed data types, normalized to snake_case
  artifacts?: ArtifactKind[] // stolen artifact kinds (stealer logs); empty for dumps
  machineId?: string       // infected machine id, when the feed exposes it
  malwareFamily?: string   // stealer family (RedLine, Lumma, ...), when known
  capturedAt?: Date        // when the log was captured off the endpoint, when known
}

Wired providers are registered in src/lib/scan/registry.ts:

Provider	`ApiProvider` id	Source
Have I Been Pwned	`HIBP`	`HIBP`
HIBP Stealer Logs	`HIBP_STEALER`	stealer log
LeakCheck	`LEAKCHECK`	dark web
DeHashed	`DEHASHED`	dark web
Intelligence X	`INTELX`	dark web
Snusbase	`SNUSBASE`	dark web

Adding a source = implement BreachProvider, add it to the PROVIDERS array in the registry, and add a value to the ApiProvider enum.

Stealer-log monitoring

Beyond breach dumps, DataShield queries Have I Been Pwned's stealer-log feed (HIBP_STEALER), which reports the website domains for which an address had credentials captured by an infostealer. Findings carry an artifacts list (ArtifactKind: PASSWORD, COOKIE, TOKEN, AUTOFILL) and, when the feed exposes it, infection metadata (machineId, malwareFamily, capturedAt).

Captured session artifacts (COOKIE, TOKEN) are the most dangerous: they hand an attacker a ready-to-replay session that bypasses MFA, which is why they carry the heaviest weight in Risk Scoring.

How a scan runs

runScan(companyId, providers) in src/lib/scan/runner.ts:

Loads all employees for the company with their existing breachRecords.
Resolves alert recipients (company admins, only if email is enabled) and active webhooks once, up front.
For each employee, for each active provider, calls lookup(). Provider errors are isolated (caught and skipped) so one failing provider never aborts the scan.
Each new finding is persisted by persistFinding: upsert the Breach, skip if the employee is already linked, otherwise create the BreachRecord + Alert, then send email and dispatch webhooks.
Sleeps RATE_LIMIT_MS (1500 ms) between employees to stay within provider rate limits.

Returns { scanned, newRecords, newAlerts }.

Active providers are loaded by loadActiveProviders, which decrypts each stored API key server-side and stamps lastUsedAt.

Severity scoring

Severity is derived from exposed data types (severityFor in the runner). The critical set is:

password, hashed_password, credit_card, ssn, bank_account

Critical types in the finding	Severity
2 or more	`CRITICAL`
exactly 1	`HIGH`
0	`MEDIUM`

Triggering a scan

POST /api/employees/scan (any authenticated user). The route enforces three guards:

Rate limit: 5 scans per company per minute, else 429.
No provider configured: 503 with a prompt to add a key in Data API.
Concurrency: one running scan per company at a time, else 409.

See API Reference for the full endpoint list and Configuration for where keys come from.

DataShield is source-available software by Melvin PETIT (WhiteMuush). Work in progress, not production ready.

DataShield

Home

Getting started

Architecture

Features

Integrations

Reference

Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breach Scanning

Breach Scanning

Providers

Stealer-log monitoring

How a scan runs

Severity scoring

Triggering a scan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DataShield

Clone this wiki locally