Skip to content

Feature: Add Custom Regex Builder UI with Sample-Based Generation #8

@goobz22

Description

@goobz22

Summary

Add a regex builder tool to the UI that allows users to generate custom regex patterns from sample data (log lines, UUIDs, etc.). Uses a sophisticated custom-built regex inference algorithm - no external packages.

Use Case

Users often encounter unique data formats in their logs or systems that aren't covered by built-in patterns. Instead of requiring regex expertise, the tool should:

  1. Accept sample data (e.g., a log line, a custom UUID format)
  2. Generate a matching regex pattern using our custom inference engine
  3. Allow users to test/refine the pattern
  4. Submit both the regex AND the original sample data to the feedback system

Regex Inference Algorithm (Custom Implementation)

Build a sophisticated regex generator from scratch:

  • Tokenization: Break sample into character classes (digits, letters, special chars, whitespace)
  • Pattern Detection: Identify repeating structures, delimiters, fixed vs variable length segments
  • Generalization: Convert specific values to regex character classes
  • Optimization: Simplify and combine adjacent similar patterns
  • Validation: Ensure generated regex matches the original sample

No external packages - pure JavaScript implementation.

Requirements

Regex Builder UI

  • Input field for pasting sample data (single line or multi-line)
  • Auto-generate regex from the sample using custom algorithm
  • Live preview showing what the regex matches
  • Ability to highlight/select specific parts of the sample to target
  • Editable regex output for manual refinement
  • Test area to validate the regex against additional samples

Integration with Feedback Loop

  • 'Submit Pattern' button that sends to POST /api/feedback
  • Payload should include:
    • regex: The generated/refined regex pattern
    • sampleData: The original data used to create the pattern
    • patternName: User-suggested name for the pattern
    • category: Suggested category (PII, System, Financial, Custom)

UI Location

  • New tab/section in the main UI: 'Pattern Builder'
  • Accessible from the main redaction view

Tasks

  • Create regex inference engine in packages/core/src/regex-builder/
  • Implement tokenization and pattern detection algorithms
  • Create regex builder component in packages/ui
  • Add live preview/testing functionality
  • Integrate with feedback API endpoint (Issue Feature: Implement Feedback Loop for Missed Redactions #5)
  • Add pattern submission flow
  • Include validation to prevent submitting overly broad patterns

Dependencies

Metadata

Metadata

Assignees

Labels

coreCore library functionalityenhancementNew feature or requestuiUser interface related

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions