Data Redactor

A powerful, client-side data redaction tool for securing sensitive information before sending to AI systems or external services. Proving that AI can be used securely with proper input sanitization.

Live Demo

https://data-redactor-ui.vercel.app/

Overview

Data Redactor is a monorepo containing three packages:

Package	Description	Published
`data-redactor-core`	Core redaction engine	npm v1.0.8
`ui`	Vanilla JS web interface	Vercel
`api`	REST API for community patterns	Vercel Serverless / Self-hosted

All redaction happens 100% client-side - no data is ever sent to a server.

Features

Redaction Strategies

Strategy	Description	Example
Token	Replace with typed placeholders	`john@email.com` → `[EMAIL_1]`
Mask	Replace with mask character, preserve structure	`john@email.com` → `**@*.*`
Format-Preserving	Replace with realistic fake data	`john@email.com` → `user42@example.net`

Built-in Pattern Detection

Category	Patterns
Network	IPv4 (with CIDR), IPv6, MAC Address, Hostname/FQDN
Personal	Email, Phone (incl. vanity), SSN, Names (8,849+ name database)
Financial	Credit Card (13-19 digits), Credit Card Last 4
Business	Ticket/Case Numbers

Pattern Builder (New in v1.0.5)

Visual tool to create custom regex patterns from sample data:

Mark Selection - Highlight text in your sample to mark what should be matched
Multi-Sample Support - Add multiple samples to refine pattern accuracy
Auto-Generation - Automatically generates optimized regex from marked text
Pattern Explanation - Human-readable breakdown of what the pattern matches
Live Testing - Test generated patterns against sample data in real-time
One-Click Add - Add patterns directly to your configuration

Pattern Testing & Validation (New in v1.0.9)

Comprehensive testing system to ensure pattern accuracy and quality:

Test Samples - 60 curated test samples (5 per pattern × 12 patterns)
Quality Scoring - 0-100 quality scores based on test coverage, accuracy, and edge case handling
Automated Testing - Run patterns against test samples to identify false positives and false negatives
Pattern Fixing - Load failed tests into Pattern Builder to fix issues
Edge Case Reporting - Report pattern issues directly from Pattern Detection tab
Pre-load System - Pre-fill Pattern Builder with problematic samples for easy fixing
Before/After Comparison - See quality score improvements when saving improved patterns
Test Metadata View - View all test samples and quality scores from JSON Editor

Quality Score Breakdown:

🟢 80-100: High quality - Pattern works reliably across all test cases
🟡 60-79: Medium quality - Some issues detected, review recommended
🔴 0-59: Low quality - Significant issues, pattern needs improvement

Community Patterns (New in v1.0.5)

Browse, share, and vote on community-contributed regex patterns:

Pattern Library - Discover patterns submitted by other users
Voting System - Upvote/downvote patterns to help surface the best ones
Category Filtering - Filter by identifier, financial, healthcare, infrastructure, personal
One-Click Use - Add community patterns to your configuration instantly
Submit Your Own - Share useful patterns with the community

Presets

Pre-configured pattern sets for common use cases:

Preset	Description
`strict-ai`	Maximum redaction for AI/LLM inputs
`minimal`	Light redaction, preserves readability
`logs`	Optimized for log file sanitization
`financial`	Focus on financial data (accounts, cards)
`healthcare`	HIPAA-focused (MRN, NPI, patient info)

Extensibility

Custom Patterns - Define your own regex patterns with configurable strategies
Custom Entities - Whitelist specific values (company names, project names, etc.)
Regex Builder - Programmatic pattern generation from samples

Engine Features

Deterministic redaction (same input → same output within session)
Overlap detection and resolution
Configurable token format per pattern type
Configurable mask character
Import/Export JSON configurations

Packages

data-redactor-core

The core TypeScript redaction engine. Zero browser dependencies - works in Node.js and browser environments.

Key exports:

DataRedactor - Main redaction class
ConfigLoader - Configuration loading and validation
DEFAULT_CONFIG - Default configuration with all patterns enabled
getPreset() / hasPreset() - Preset configuration helpers
generateFromSample() / refineFromSamples() - Regex builder utilities
Testing & Validation (v1.0.9):
- PatternTestEngine - Execute patterns against test samples
- calculateQualityScore() - Calculate 0-100 quality scores
- getTestSample() / getTestSamplesForPattern() - Access 60 curated test samples
- ALL_TEST_SAMPLES - All test samples by ID
Pattern classes: IPv4Pattern, EmailPattern, NamePattern, etc.
Strategy classes: TokenStrategy, MaskStrategy, FormatPreservingStrategy

UI

Vanilla JavaScript web application (no framework dependencies) with four main tabs:

Pattern Detection - Toggle patterns on/off, select strategies per pattern, report issues
JSON Editor - Full configuration editing with validation, view test metadata
Output Format - Interactive per-pattern testing with live preview of all strategies
Pattern Validation (New in v1.0.9) - Four sub-tabs:
- Builder - Visual tool to create/fix custom regex patterns from sample data
- Test Samples - Run patterns against 60 curated test samples, view quality scores
- Community - Browse and use community-contributed patterns
- Edge Cases - View and vote on reported pattern issues

UI Features:

Mobile-responsive design with optimized touch targets
Collapsible accordion sections for better organization
Dark mode support
Keyboard shortcuts for common actions
Pattern testing with quality scoring (v1.0.9)
Issue reporting workflow (v1.0.9)
Pre-load system for fixing failed patterns (v1.0.9)

API Server

Bun-powered REST API for community patterns, edge cases, and feedback:

Endpoint	Method	Description
`/api/health`	GET	Health check
`/api/redact`	POST	Redact text (server-side option)
`/api/presets`	GET	List available presets
Community Patterns
`/api/patterns`	GET	List community patterns
`/api/patterns`	POST	Submit a new pattern
`/api/patterns/:id`	GET	Get pattern details
`/api/patterns/:id/vote`	POST	Vote on a pattern
`/api/patterns/:id/use`	POST	Mark pattern as used
Edge Cases (v1.0.9)
`/api/patterns/:name/edge-cases`	GET	List edge cases for a pattern
`/api/patterns/:name/edge-cases`	POST	Submit edge case report
`/api/edge-cases/:id`	GET	Get edge case details
`/api/edge-cases/:id/vote`	POST	Vote on edge case
`/api/edge-cases/:id`	PATCH	Update edge case status
`/api/edge-cases/:id`	DELETE	Delete edge case
Sample Submissions (v1.0.9)
`/api/patterns/:name/sample-submissions`	GET	List submitted samples
`/api/patterns/:name/sample-submissions`	POST	Submit test sample
Feedback
`/api/feedback`	GET/POST	Feedback collection

Database: MongoDB Atlas - works both locally and on Vercel. Set MONGODB_URI environment variable.

Installation

# Install the core package
npm install data-redactor-core

# Or use bun
bun add data-redactor-core

Usage

Basic Example

import { DataRedactor } from 'data-redactor-core';

const redactor = new DataRedactor();

const text = "Contact john.doe@email.com at 555-123-4567";
const result = redactor.redact(text);

console.log(result.redactedText);
// "Contact [EMAIL_1] at [PHONE_1]"

console.log(result.mapping);
// { "john.doe@email.com": "[EMAIL_1]", "555-123-4567": "[PHONE_1]" }

Using Presets

import { DataRedactor, getPreset } from 'data-redactor-core';

// Use a preset configuration
const redactor = new DataRedactor(getPreset('strict-ai'));

// Or for healthcare compliance
const hipaaRedactor = new DataRedactor(getPreset('healthcare'));

Custom Configuration

import { DataRedactor } from 'data-redactor-core';

const config = {
  patterns: {
    email: { enabled: true, strategy: 'mask' },
    phone: { enabled: true, strategy: 'token' },
    ipv4: { enabled: false }
  },
  formatOptions: {
    tokenFormat: '[{TYPE}_{INDEX}]',
    maskChar: '*',
    preserveStructure: true
  }
};

const redactor = new DataRedactor(config);

Custom Patterns

const config = {
  patterns: {
    custom: [
      {
        name: 'caseId',
        regex: 'CASE-\\\\d{6}',
        strategy: 'token',
        flags: 'gi'
      }
    ]
  }
};

const redactor = new DataRedactor(config);
const text = "Please reference CASE-123456 in your response";
const result = redactor.redact(text);
// "Please reference [CASEID_1] in your response"

Regex Builder (Programmatic)

import { generateFromSample, refineFromSamples } from 'data-redactor-core';

// Generate pattern from a single sample
const result = generateFromSample('ABC-12345', {
  wordBoundaries: true,
  caseInsensitive: false
});

console.log(result.regex);
// "[A-Z]{3}-\\d{5}"

// Refine with multiple samples
const refined = refineFromSamples(
  ['ABC-12345', 'XYZ-67890', 'DEF-11111'],
  { wordBoundaries: true }
);

Pattern Testing & Quality Scoring (New in v1.0.9)

import {
  PatternTestEngine,
  calculateQualityScore,
  getTestSamplesForPattern
} from 'data-redactor-core';

// Get test samples for a pattern
const testSamples = getTestSamplesForPattern('ipv4');
console.log(testSamples.length); // 5 test samples

// Test a pattern against samples
const patternConfig = {
  enabled: true,
  strategy: 'token',
  regex: '\\b(?:[0-9]{1,3}\\.){3}[0-9]{1,3}\\b'
};

const results = testSamples.map(sample =>
  PatternTestEngine.executeTest('ipv4', patternConfig, sample)
);

// Calculate quality score
const qualityScore = calculateQualityScore(results, 0);
console.log(qualityScore); // 0-100

// Check results
results.forEach(result => {
  console.log(`Sample: ${result.sampleId}`);
  console.log(`Passed: ${result.passed}`);
  console.log(`Accuracy: ${result.accuracy}%`);
  console.log(`False Positives: ${result.falsePositives}`);
  console.log(`False Negatives: ${result.falseNegatives}`);
});

Custom Entities

Redact specific values like company names, project names, or customer names:

const config = {
  customEntities: {
    companyNames: ["Acme Corp", "Globex Corporation"],
    projectNames: ["Project Phoenix", "Operation Sunrise"],
    customerNames: ["John Smith", "Jane Doe"]
  }
};

const redactor = new DataRedactor(config);
const text = "Acme Corp is working on Project Phoenix with John Smith";
const result = redactor.redact(text);
// "[COMPANYNAMES_1] is working on [PROJECTNAMES_1] with [CUSTOMERNAMES_1]"

Customizing Token Format

const config = {
  formatOptions: {
    tokenFormat: '<{TYPE}:{INDEX}>',  // Default: '[{TYPE}_{INDEX}]'
    maskChar: '#',                      // Default: '*'
    preserveStructure: true             // Default: true
  },
  patterns: {
    email: { enabled: true, strategy: 'token' },
    phone: { enabled: true, strategy: 'mask' }
  }
};

const redactor = new DataRedactor(config);
const text = "Email: test@example.com Phone: 555-1234";
const result = redactor.redact(text);
// "Email: <EMAIL:1> Phone: ###-####"

Loading Configuration from File (Node.js)

import { DataRedactor, ConfigLoader } from 'data-redactor-core';

// Load from JSON file
const config = ConfigLoader.loadFromFile('./my-config.json');
const redactor = new DataRedactor(config);

// Or get default config
const defaultConfig = ConfigLoader.getDefault();

// Validate config
const validation = ConfigLoader.validateConfig(config);
if (!validation.valid) {
  console.error('Config errors:', validation.errors);
}

Deployment

Option 1: Vercel (Recommended)

The project is configured to deploy both the UI and API to Vercel:

Push to GitHub
Import project in Vercel
Add environment variable: MONGODB_URI (your MongoDB Atlas connection string)
Deploy

Vercel will:

Build the UI using bun run build:ui
Deploy serverless API functions from the /api directory
Serve static files from /dist

Option 2: Self-Hosted (Bun)

Run the full application with a single Bun server:

# Install dependencies
bun install

# Set environment variable
export MONGODB_URI="mongodb+srv://..."

# Build and start production server
bun start

This starts a single server on port 3000 serving both the UI and API.

Development

bun install        # Install dependencies (also builds core)
bun dev            # Run both UI and API dev servers
bun dev:ui         # Run UI dev server with hot reload
bun dev:api        # Run API server only
bun start          # Build and run production server
bun build          # Build everything (core + UI)
bun build:core     # Build core library only
bun build:ui       # Build UI for static deployment
bun lint           # Run ESLint
bun typecheck      # Run TypeScript type checking
bun format         # Run Prettier

Project Structure

data-redactor/
├── package.json        # Root package config
├── tsconfig.json       # TypeScript config
├── build-ui.js         # UI bundler script (Bun.build)
├── dev.ts              # Combined dev server runner
├── vercel.json         # Vercel deployment config
├── dist/               # Built UI (static files for deployment)
├── packages/
│   ├── core/src/       # Redaction engine source (TypeScript)
│   │   ├── index.ts    # Main exports
│   │   ├── engine.ts   # Core redaction logic
│   │   ├── config.ts   # Configuration handling
│   │   ├── presets.ts  # Preset configurations
│   │   ├── patterns/   # Pattern implementations
│   │   ├── regex-builder/  # Pattern generation from samples
│   │   └── scenarios/  # Context-aware redaction scenarios
│   ├── ui/             # Vanilla JS UI source
│   │   ├── index.html
│   │   ├── main.js
│   │   └── styles.css
│   └── api/            # REST API server (self-hosted)
│       ├── server.ts   # Bun server entry
│       ├── routes/     # API route handlers
│       └── db/         # MongoDB database client
├── api/                # Vercel serverless functions
│   ├── health.ts
│   ├── presets.ts
│   ├── feedback.ts
│   ├── patterns/
│   └── lib/db.ts       # Shared MongoDB client
├── config-examples/
└── examples/
    └── tampermonkey-redactor.js  # Browser userscript example

Tech Stack

Latest versions as of 11/29/2025

Category	Package	Version
Runtime	Bun	1.3+
UI	Vanilla JavaScript	ES2022
Build	tsup (core), Bun.build (UI)	^8
Language	TypeScript (core)	^5
Database	MongoDB Atlas	^7
Name Data	common-last-names	^1
	datasets-male-first-names-en	^1
	datasets-female-first-names-en	^1
Deploy	Vercel (UI + Serverless API)	-

License

MIT

Author

Matthew Goluba - @goobz22

Contributing

Contributions welcome! See open issues for planned features.

Ways to Contribute

Submit Patterns - Use the Pattern Builder to create and submit useful patterns
Vote on Patterns - Help surface the best community patterns
Report Issues - Found a bug or false positive? Open an issue
Feature Requests - Ideas for new patterns or features? We'd love to hear them

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
api		api
config-examples		config-examples
dist		dist
examples		examples
packages		packages
presidio-backend		presidio-backend
.env.example		.env.example
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
README.md		README.md
build-ui.js		build-ui.js
bun.lock		bun.lock
dev.ts		dev.ts
eslint.config.mjs		eslint.config.mjs
package.json		package.json
seed-edge-cases.ts		seed-edge-cases.ts
start.bat		start.bat
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

Data Redactor

Live Demo

Overview

Features

Redaction Strategies

Built-in Pattern Detection

Pattern Builder (New in v1.0.5)

Pattern Testing & Validation (New in v1.0.9)

Community Patterns (New in v1.0.5)

Presets

Extensibility

Engine Features

Packages

data-redactor-core

UI

API Server

Installation

Usage

Basic Example

Using Presets

Custom Configuration

Custom Patterns

Regex Builder (Programmatic)

Pattern Testing & Quality Scoring (New in v1.0.9)

Custom Entities

Customizing Token Format

Loading Configuration from File (Node.js)

Deployment

Option 1: Vercel (Recommended)

Option 2: Self-Hosted (Bun)

Development

Project Structure

Tech Stack

License

Author

Contributing

Ways to Contribute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages