-
Notifications
You must be signed in to change notification settings - Fork 0
Breach Scanning
The scan engine checks every employee email against every configured
breach-intelligence provider, persists matches, and fires notifications. It
lives in src/lib/scan/.
A provider implements one contract (src/lib/scan/types.ts):
interface BreachProvider {
id: ApiProvider // HIBP | HIBP_STEALER | DEHASHED | LEAKCHECK | INTELX | SNUSBASE
source: BreachSource // HIBP | MANUAL | DARK_WEB | STEALER_LOG
lookup(email: string, apiKey: string): Promise<Finding[]>
}A Finding is the normalized result shared across providers:
interface Finding {
name: string // breach identifier, used as the Breach unique key
breachDate: Date // epoch (1970) when the provider does not expose it
dataTypes: string[] // exposed data types, normalized to snake_case
artifacts?: ArtifactKind[] // stolen artifact kinds (stealer logs); empty for dumps
machineId?: string // infected machine id, when the feed exposes it
malwareFamily?: string // stealer family (RedLine, Lumma, ...), when known
capturedAt?: Date // when the log was captured off the endpoint, when known
}Wired providers are registered in src/lib/scan/registry.ts:
| Provider |
ApiProvider id |
Source |
|---|---|---|
| Have I Been Pwned | HIBP |
HIBP |
| HIBP Stealer Logs | HIBP_STEALER |
stealer log |
| LeakCheck | LEAKCHECK |
dark web |
| DeHashed | DEHASHED |
dark web |
| Intelligence X | INTELX |
dark web |
| Snusbase | SNUSBASE |
dark web |
Adding a source = implement
BreachProvider, add it to thePROVIDERSarray in the registry, and add a value to theApiProviderenum.
Beyond breach dumps, DataShield queries Have I Been Pwned's stealer-log feed
(HIBP_STEALER), which reports the website domains for which an address had
credentials captured by an infostealer. Findings carry an artifacts list
(ArtifactKind: PASSWORD, COOKIE, TOKEN, AUTOFILL) and, when the feed
exposes it, infection metadata (machineId, malwareFamily, capturedAt).
Captured session artifacts (COOKIE, TOKEN) are the most dangerous: they hand
an attacker a ready-to-replay session that bypasses MFA, which is why they carry
the heaviest weight in Risk Scoring.
runScan(companyId, providers) in src/lib/scan/runner.ts:
- Loads all employees for the company with their existing
breachRecords. - Resolves alert recipients (company admins, only if email is enabled) and active webhooks once, up front.
- For each employee, for each active provider, calls
lookup(). Provider errors are isolated (caught and skipped) so one failing provider never aborts the scan. - Each new finding is persisted by
persistFinding: upsert theBreach, skip if the employee is already linked, otherwise create theBreachRecord+Alert, then send email and dispatch webhooks. - Sleeps
RATE_LIMIT_MS(1500 ms) between employees to stay within provider rate limits.
Returns { scanned, newRecords, newAlerts }.
Active providers are loaded by loadActiveProviders, which decrypts each
stored API key server-side and stamps lastUsedAt.
Severity is derived from exposed data types (severityFor in the runner).
The critical set is:
password, hashed_password, credit_card, ssn, bank_account
| Critical types in the finding | Severity |
|---|---|
| 2 or more | CRITICAL |
| exactly 1 | HIGH |
| 0 | MEDIUM |
POST /api/employees/scan (any authenticated user). The route enforces three
guards:
-
Rate limit: 5 scans per company per minute, else
429. -
No provider configured:
503with a prompt to add a key in Data API. -
Concurrency: one running scan per company at a time, else
409.
See API Reference for the full endpoint list and Configuration for where keys come from.
DataShield is source-available software by Melvin PETIT (WhiteMuush). Work in progress, not production ready.
Getting started
Architecture
Features
- Breach Scanning
- Risk Scoring
- Directory Integrations
- MFA Coverage
- SCIM Provisioning
- Dashboard and Widgets
- Reports
- Exposure Register
Integrations
Reference
Contributing