Skip to content

Latest commit

 

History

History
1212 lines (970 loc) · 53.7 KB

File metadata and controls

1212 lines (970 loc) · 53.7 KB

CUIckScan — Application Specification

Version: 1.4.28 | Last Updated: 2026-02-28 Owner: Eldrir Technologies LLC | License: BSL 1.1 (app), Apache 2.0 (detection rules)


Table of Contents

  1. Overview
  2. Architecture
  3. Application Startup Flow
  4. React UI Specification
  5. API Specification
  6. Scan Engine
  7. Pattern Detection System
  8. Persistence Layer
  9. Encryption System
  10. File Content Extraction
  11. Report Generation
  12. Session Management
  13. State Machine
  14. UI Component Hierarchy
  15. Theming System
  16. Error Handling & Logging
  17. Security Controls
  18. Configuration Files

1. Overview

CUIckScan is a native Windows desktop application that scans file systems for Controlled Unclassified Information (CUI), Cyber Threat Indicators (CTI), ITAR/export-control markers, and DFARS clause references. It produces risk-scored findings per file to help organizations achieve and maintain CMMC/NIST 800-171 compliance.

Key Capabilities

  • Multi-format scanning: PDF, DOCX, XLSX, PPTX, EML, XLS, DOC, PPT, VSDX, DWG, images (EXIF), plain text, code files
  • Configurable detection rules: JSON-defined term lists and regex patterns with three sensitivity tiers
  • Risk scoring: Per-finding and per-file aggregate scores with four severity levels
  • Encrypted findings at rest: AES-256-GCM with PBKDF2 key derivation (FIPS 140-2 compliant)
  • Smart rescan: XXHash64 checksums for incremental scanning; preserves user review statuses
  • Review workflow: Per-file Confirmed / False Positive / Pending status with user attribution
  • PDF executive reports: Multi-page reports with charts, severity breakdown, and methodology
  • Database portability: Save/open/transfer scan databases; path rebasing for cross-machine use

UI Parity Requirement

CUIckScan ships two UI implementations: the primary React SPA (rendered in WebView2) and a pure WinForms fallback (FallbackForm.cs) for systems where the Edge WebView2 runtime is not installed. These two UIs must be kept in feature parity. Any user-facing feature, workflow, or capability added to the React UI must have a functionally equivalent implementation in the FallbackForm, and vice versa. This includes but is not limited to:

  • Scan configuration (root path, rules selection, thread count, max file size)
  • Scan lifecycle (start, pause, resume, cancel, rescan, smart rescan)
  • Results display (filtering by text/category/score, sorting, file type icons)
  • Review workflow (Confirmed / False Positive / Pending with user attribution)
  • Encryption (set passphrase, enter passphrase, change passphrase, progress display)
  • Session management (resume, clear, new DB, open DB, save DB, close-with-save prompt)
  • File operations (open file, reveal in explorer, export CSV, copy/move files, generate report)
  • Path rebasing for opened databases from other machines
  • Theming (dark/light mode with matching color tokens)
  • Keyboard shortcuts (Ctrl+T theme toggle, Ctrl+O open DB, F5 scan/pause/resume, Escape)

The React UI communicates with the backend via the embedded Kestrel REST API, while the FallbackForm calls ScanService / ScanStateStore directly. Despite this architectural difference, the user experience and available functionality must remain equivalent. When modifying either UI, always check whether a parallel change is needed in the other.


2. Architecture

Technology Stack

Layer Technology
Runtime .NET 8 (WinExe, self-contained, single-file publish)
Window shell WinForms borderless window + custom title bar
UI rendering WebView2 (Edge Chromium runtime) → React 18 SPA
Fallback UI Pure WinForms (FallbackForm.cs) when WebView2 unavailable
Web server Embedded ASP.NET Core Kestrel (localhost-only, random port)
Database SQLite via Microsoft.Data.Sqlite (WAL mode)
Frontend build Vite (dev server with HMR + production bundling)

Communication Model

┌──────────────────────────────────────────────────────────────┐
│  WinForms MainForm (borderless, owns WebView2 control)       │
│  ┌────────────────────────────────────────────────────────┐  │
│  │  WebView2 → React SPA (App.jsx)                        │  │
│  │  ┌──────────────────────────────────────────────────┐  │  │
│  │  │  REST API calls (fetch → localhost:PORT/api/*)    │  │  │
│  │  │  Status polling (GET /api/scan/status every 400ms)│  │  │
│  │  │  postMessage → window ops (drag, minimize, etc.)  │  │  │
│  │  └──────────────────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────────────────┘  │
│                          ↕ REST + postMessage                 │
│  ┌────────────────────────────────────────────────────────┐  │
│  │  Kestrel API Host (ScanApiHost.cs)                     │  │
│  │  ├─ ScanService (parallel file scanner)                │  │
│  │  ├─ PatternLibrary (regex compilation + matching)      │  │
│  │  ├─ ScanStateStore (SQLite persistence)                │  │
│  │  ├─ FieldCipher (AES-256-GCM encryption)               │  │
│  │  ├─ ContentExtractor (multi-format text extraction)    │  │
│  │  └─ ReportGenerator (QuestPDF report output)           │  │
│  └────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────┘

Three Communication Channels

  1. REST API (request/response): All scan operations, data queries, file management, encryption, and configuration. React calls fetch() to localhost:PORT/api/*.

  2. Status Polling (replaces SSE): The React UI polls GET /api/scan/status every 400ms during active scans (2s when idle) with exponential backoff on errors. Returns scan state, progress counters, and flagged file count.

  3. WebView2 postMessage (fire-and-forget): Window operations sent from React to C# via window.chrome.webview.postMessage() for minimize, maximize, close, force-close, drag, and title bar double-click. C# sends window state updates (maximized/restored) and title updates back.


3. Application Startup Flow

Boot Sequence (Program.cs → MainForm.cs)

1. Program.cs: Main()
   ├─ Parse --debug flag
   ├─ Initialize CrashLog
   ├─ Select random available TCP port
   ├─ Start Kestrel on http://localhost:{port}
   │   └─ Configure: static files (wwwroot/), API endpoints, CORS
   ├─ Create WinForms Application
   └─ Open MainForm(port, debugMode)
       ├─ Set borderless window (FormBorderStyle.None)
       ├─ Set minimum size 900×600
       ├─ Initialize WebView2
       │   ├─ Configure user data folder
       │   ├─ Navigate to http://localhost:{port}
       │   ├─ Listen for postMessage events
       │   └─ On NavigationCompleted: hide loading state
       └─ If WebView2 fails → open FallbackForm

React SPA Initialization (App.jsx: CUIckScanApp)

2. React SPA mounts → CUIckScanApp()
   ├─ Show splash screen (minimum 900ms display)
   ├─ Parallel initialization:
   │   ├─ GET /api/rules/list → load all detection rules, enable all by default
   │   ├─ GET /api/app/username → load Windows username for review attribution
   │   ├─ GET /api/app/version → load app version string
   │   └─ GET /api/scan/debug-info → check if --debug mode active
   │
   ├─ Start status polling loop (runs for app lifetime)
   │
   ├─ Session recovery check:
   │   └─ GET /api/scan/session
   │       ├─ Session found WITH content:
   │       │   └─ Show Resume Modal ("Resume Scan" / "Load Session" / "Start New")
   │       ├─ Session found WITHOUT content (blank from previous Clear):
   │       │   ├─ Restore enabled rules and file extensions
   │       │   └─ Show Passphrase Modal (enter existing or set new)
   │       └─ No session found:
   │           ├─ POST /api/scan/clear-db → create blank database
   │           └─ Show Passphrase Modal (set new)
   │
   ├─ Fade out splash screen (450ms animation)
   └─ App ready — user can interact

Session Load Flow (when user chooses "Resume" / "Load Session")

3. handleLoadSession()
   ├─ Restore scanRoot, enabledRules, fileExtensions from session data
   ├─ Step 1: Check if DB needs schema upgrade
   │   ├─ If needsUpgrade → Show Upgrade Modal
   │   │   └─ User chooses "Backup & Upgrade" or "Upgrade without backup"
   │   │       └─ POST /api/scan/upgrade-db → migrate schema
   │   └─ Continue to Step 2
   │
   ├─ Step 2: Check encryption state
   │   ├─ If encrypted → Show Passphrase Modal ("enter")
   │   │   └─ User enters passphrase → POST /api/app/verify-passphrase
   │   ├─ If not encrypted → Show Passphrase Modal ("set")
   │   │   └─ User sets passphrase → POST /api/app/set-passphrase
   │   │       ├─ If existing data: show "Encrypting Database" progress modal
   │   │       │   └─ Poll GET /api/app/encryption-progress until complete
   │   │       └─ Complete → continue
   │   └─ Continue to Step 3
   │
   └─ Step 3: finishLoadSession()
       ├─ POST /api/scan/load-session
       ├─ Fetch results from API
       ├─ Set scanRoot from decrypted session data
       ├─ If scan was interrupted (canResume=true):
       │   └─ POST /api/scan/resume → continue scanning
       └─ Check if scanRoot directory exists → offer path rebase if not

4. React UI Specification

4.1 Layout Structure

The application uses a fixed vertical layout with four zones:

┌──────────────────────────────────────────────────────────────┐
│  TitleBar (36px) — Custom drag region, app name, window btns │
├──────────────────────────────────────────────────────────────┤
│  ControlBar (~80px) — Scan root, rules, threads, action btns │
├──────────────────────────────────────────────────────────────┤
│  ┌────────────────────┬─────────────────────────────────────┐│
│  │  ResultsPanel      │  DetailsPanel                       ││
│  │  (53%, 440-700px)  │  (flex remaining)                   ││
│  │                    │                                     ││
│  │  Filter bar        │  File header + metadata             ││
│  │  Score/Category    │  Per-finding cards:                  ││
│  │  filters           │    ├─ Category tag + rule label      ││
│  │  Result cards      │    ├─ "Triggered by" badge           ││
│  │  with review btns  │    └─ Highlighted snippet            ││
│  │                    │                                     ││
│  └────────────────────┴─────────────────────────────────────┘│
├──────────────────────────────────────────────────────────────┤
│  StatusBar (~66px) — Action buttons, progress bar, status    │
└──────────────────────────────────────────────────────────────┘

4.2 Component Inventory

Core Layout Components

Component Lines Purpose
CUIckScanApp 1863-3300 Root component: all state management, lifecycle, event handlers
TitleBar 587-634 Custom borderless title bar with drag, theme toggle, window controls
ControlBar 637-873 Scan configuration: root path, rules popover, threads, action buttons
ResultsPanel 877-1075 Scrollable file list with filters, scores, review buttons
DetailsPanel 1160-1237 Per-file finding details with highlighted snippets
StatusBar 1240-1292 Bottom bar: export/copy/move/DB actions, progress bar, status text

Modal Components

Component Lines Purpose
Modal 268-285 Generic modal container (overlay + card + close button)
PassphraseModal 288-443 Set/enter encryption passphrase with warnings + encryption progress
ChangePassphraseModal 446-584 Change existing passphrase with re-encryption progress
RuleEditor 1360-1611 Full-screen JSON editor for detection rules
FileTypesEditor 1614-1861 Full-screen file extension manager with add/remove/toggle
RebaseModalContent 230-266 Remap file paths when scan root directory has moved

Utility Components

Component Lines Purpose
SeverityTag 101-110 Colored pill badge for finding category (CUI/CTI/ITAR/DFARS)
ScoreBadge 136-144 Score display with severity icon and label
FileTypeIcon 200-213 SVG file type icon (XLS, DOC, PDF, etc.) by extension
HighlightedSnippet 1078-1158 Renders text with matched terms highlighted; tabular mode for spreadsheets
JsonHighlight 1295-1357 JSON syntax highlighter for the rule editor
ToastContainer 216-227 Fixed-position notification stack (bottom-right)

4.3 State Management

All state lives in CUIckScanApp via React useState hooks. Key state variables:

Scan State

Variable Type Purpose
scanRoot string Directory path to scan
threads number (1-32) Parallel scan thread count
maxFileSizeMB number (1-500) Max structured document parse size
scanState string Current scan state: "idle"/"running"/"paused"/"completed"/"canceled"
results array Array of file result objects with findings
selectedIndex number Currently selected result index (-1 = none)
progress object {done, total, flagged} counters
statusText string Human-readable status message

Rule State

Variable Type Purpose
allRules array All available rules from backend [{label, type, score, tier, appliesTo}]
enabledRules Set Set of enabled rule label strings
sessionRules Set/null Rules used for current session (for change detection)

Encryption State

Variable Type Purpose
passphraseReady boolean Whether cipher is active (passphrase has been set/entered)
showPassphraseModal string/null Modal mode: "set"/"enter"/"upgrading"/null
showChangePassModal string/null Change passphrase modal: true/"changing"/null
encryptProgress object {done, total} for encryption upgrade progress

UI State

Variable Type Purpose
darkMode boolean Theme toggle (default: matches OS preference)
isMaximized boolean Window maximized state (from C# postMessage)
filter string Text filter for results panel
scoreFilter string Score severity filter: "all"/"low"/"medium"/"high"/"critical"
categoryFilter string Category filter: "all"/"CUI"/"CTI"/"ITAR"/"DFARS"/"Heuristic"
showFalsePositives boolean Whether to show false-positive results
openedDbName string/null Filename of opened DB (shown in title bar)
appVersion string Application version string
debugMode boolean Whether --debug flag is active

Modal State

Variable Type Purpose
showResumeModal boolean Session resume prompt
showRescanModal boolean Rescan/fresh scan choice
showSaveWarning string/null Save-before-destructive-action: "newScanDb"/"clearResults"/"openDb"
showSaveChoiceModal boolean Save-to-source vs save-as when DB was opened
showCloseModal object/null Close-app confirmation with save options
showUpgradeModal object/null Database schema upgrade prompt
showRebaseModal object/null Path remapping when root doesn't exist
showRuleEditor boolean Full-screen rule editor
showFileTypesEditor boolean Full-screen file types editor
showCopyMoveModal string/null Copy/move confirmation: "copy"/"move"

4.4 Polling Architecture

The status polling loop runs for the application's lifetime:

// Poll frequency:
//   Active scan (running/paused): 400ms
//   Idle/completed/canceled: 2000ms
//   On error: exponential backoff up to 30s

while (active) {
  GET /api/scan/status  {state, progressDone, progressTotal, flaggedCount, statusText}

  if (justFinished):
    // Full reload — gets final sorted results from DB
    GET /api/scan/results  setResults(items)

  else if (scanning && serverHasNew):
    // Incremental fetch — only new results since last count
    GET /api/scan/results?since=N  append unique new items

  await delay(isActive ? 400ms : 2000ms, with backoff on errors)
}

4.5 Key User Flows

Start Scan Flow

User clicks "Start Scan"
├─ Validate: scanRoot is non-empty
├─ If no previous session → doStartScan({})
├─ If previous session exists:
│   ├─ Check if same root path → if different: fresh scan
│   ├─ Check if rules changed → if yes: full scan with rulesChanged flag
│   ├─ Check if new file extensions added → if yes: show Rescan Modal
│   │   └─ Options: "Scan new extensions only" / "Rescan unchanged" / "Fresh scan"
│   └─ If same rules + same extensions → show Rescan Modal
│       └─ Options: "Rescan — skip unchanged" / "Fresh scan"
│
└─ doStartScan(options):
    ├─ Check encryption status
    │   ├─ Encrypted but no cipher → prompt passphrase ("enter")
    │   └─ Not encrypted → prompt passphrase ("set")
    ├─ POST /api/scan/start with payload:
    │   {rootPath, mode, threads, maxFileSizeMB, rescan, enabledLabels,
    │    rulesChanged, newExtensionsOnly}
    └─ Track session rules and extensions for change detection

Review Workflow

User clicks "Confirmed" or "False Positive" on a result card
├─ POST /api/scan/review {filePath, status: 0|1|2}
│   (0=Pending, 1=Confirmed, 2=FalsePositive)
├─ Update local state with reviewedBy (Windows username) and reviewedUtc
├─ If marking as False Positive with "Show False Positives" unchecked:
│   └─ 350ms dismissal animation → item slides out of view
└─ Review status persisted to SQLite, survives rescan if file unchanged

Close App Flow

User clicks close button (X)
├─ GET /api/scan/close-info
│   └─ Returns: {needsPrompt, hasOpenedDb, sourceName, sourceExists,
│                completedFiles, flaggedFiles}
├─ If needsPrompt = false → forceClose()
├─ If hasOpenedDb = true:
│   └─ Show modal: "Save to [source]" / "Save As..." / "Discard" / "Cancel"
└─ If hasOpenedDb = false:
    └─ Show modal: "Backup..." / "Don't Backup" / "Cancel"

5. API Specification

5.1 Scan Operations

Method Endpoint Purpose
POST /api/scan/start Start a new scan or rescan
POST /api/scan/pause Pause a running scan
POST /api/scan/unpause Resume a paused scan
POST /api/scan/cancel Cancel a running/paused scan
GET /api/scan/status Poll scan state and progress counters
GET /api/scan/results Fetch all results (or incremental via ?since=N)
POST /api/scan/review Set review status for a file
POST /api/scan/export Export file paths to CSV
POST /api/scan/copy-files Copy flagged files to target directory
POST /api/scan/move-files Move flagged files to target directory
POST /api/scan/generate-report Generate PDF executive report
POST /api/scan/rebase-paths Remap file paths (old prefix → new prefix)

5.2 Session Management

Method Endpoint Purpose
GET /api/scan/session Check for existing session (boot-time)
POST /api/scan/load-session Load session results into memory
POST /api/scan/resume Resume an interrupted scan
POST /api/scan/save-db Save database copy (save-as or overwrite source)
POST /api/scan/open-db Open an external .db file
POST /api/scan/clear-results Clear Files + Findings, preserve Session row
POST /api/scan/clear-db Full reset: drop and recreate all tables
POST /api/scan/upgrade-db Upgrade database schema to current version
GET /api/scan/close-info Pre-close check (has unsaved data?)
POST /api/scan/close-discard Mark clean shutdown and allow close
POST /api/scan/update-session-settings Persist enabledRules / fileExtensions to session
POST /api/scan/check-extension-changes Check how many DB files would be affected by ext removal
POST /api/scan/remove-by-extensions Remove files with specific extensions from DB

5.3 Rules & File Types

Method Endpoint Purpose
GET /api/rules/list Get all rule metadata (label, type, score, tier, appliesTo)
GET /api/rules Get raw rules.json content
POST /api/rules Save updated rules.json
POST /api/rules/verify Validate rules JSON (syntax + regex compilation)
POST /api/rules/revert Revert rules.json to built-in defaults
GET /api/filetypes Get raw filetypes.json content
POST /api/filetypes Save updated filetypes.json
POST /api/filetypes/revert Revert filetypes.json to built-in defaults
POST /api/filetypes/apply-session Apply session-stored file extension config

5.4 Encryption & Security

Method Endpoint Purpose
GET /api/app/encryption-status Check if DB is encrypted and cipher is active
POST /api/app/set-passphrase Set new encryption passphrase (may trigger upgrade)
POST /api/app/verify-passphrase Verify passphrase for encrypted DB
POST /api/app/change-passphrase Change passphrase (re-encrypts all data)
GET /api/app/encryption-progress Poll encryption/re-encryption progress
POST /api/app/reload-results Reload results after cipher change

5.5 Native Dialogs

Method Endpoint Purpose
POST /api/dialogs/browse-folder Open native folder picker (FolderBrowserDialog)
POST /api/dialogs/open-file Open native file picker (OpenFileDialog)
POST /api/dialogs/save-file Open native save picker (SaveFileDialog)

5.6 Application

Method Endpoint Purpose
GET /api/app/version Get application version string
GET /api/app/username Get Windows username for review attribution
GET /api/app/scans-folder Get path to scans directory
POST /api/app/open-file Open a file with default system application
POST /api/app/reveal-in-explorer Reveal a file in Windows Explorer
POST /api/app/directory-exists Check if a directory path exists
GET /api/scan/debug-info Check if debug mode is active
POST /api/app/debug-log Write a log message from frontend

5.7 Request/Response Headers

All API requests include X-CUIckScan: 1 header for request identification. API responses return JSON with Content-Type: application/json.


6. Scan Engine

6.1 ScanService Architecture

The ScanService (sealed class) orchestrates multi-threaded file scanning:

ScanService.ScanAsync(rootPath, enabledLabels, options, progress, cancel)
├─ Take immutable pattern snapshot (thread-safe copy of PatternLibrary)
├─ Enumerate candidate files (filtered by extension, skipping symlinks)
├─ Seed files into SQLite (bulk insert in 50K batches)
├─ Parallel.ForEachAsync (configurable 1-32 threads):
│   ├─ Check pause state (250ms poll loop)
│   ├─ Per-file timeout (30 seconds via linked CancellationTokenSource)
│   ├─ ContentExtractor.Extract(filePath, extension)
│   ├─ PatternLibrary.FindMatchesWithSnapshot(text, snapshot)
│   ├─ ScanStateStore.MarkFileResult(filePath, findings)
│   └─ Interlocked.Increment(&completedCount) → progress callback
├─ Periodic WAL checkpoint (every 2500 files)
└─ Final checkpoint + mark session complete/canceled

6.2 Key Constants

Constant Value Purpose
PausePollIntervalMs 250 Milliseconds between pause checks
PerFileTimeoutSeconds 30 Max time per file extraction
MaxFileSizeBytes 500 MB Skip files larger than this
CheckpointIntervalFiles 2500 WAL checkpoint frequency

6.3 File Enumeration

Files are enumerated using a stack-based DFS traversal:

  • Filters by enabled extensions from FileTypesConfig
  • Skips symbolic links and junctions (prevents loops and boundary escape)
  • Ignores system folders: $Recycle.Bin, System Volume Information
  • Graceful degradation on access-denied (logs warning, continues)
  • Returns file paths sorted for deterministic ordering

6.4 Scan Modes

Option Behavior
Fresh scan Seed all matching files, scan everything
Rescan Compare XXHash64 checksums, skip unchanged files
Rules changed Full rescan but preserve review statuses where file+pattern unchanged
New extensions only Only seed/scan files matching newly added extensions

6.5 Error Recovery

  • Per-file exceptions: Caught and logged; file marked as error; scan continues
  • OutOfMemoryException: Aggressive GC (full collection + finalize), then retry
  • DB write failures: Retry with exponential backoff (50ms × attempt, up to 3 retries)
  • Unclean shutdown: Session marked CleanShutdown=false; next launch offers recovery

7. Pattern Detection System

7.1 Rule Schema

Each detection rule has:

{
  "label": "CUI formal markings",        // Human-readable name (unique identifier)
  "type": "CUI",                          // Category: CUI, CTI, ITAR, DFARS, Heuristic
  "score": 15,                            // Confidence weight (0-100)
  "tier": "low",                          // Aggressiveness: low, medium, high
  "appliesTo": ["CUI", "CTI"],            // Categories this rule applies to
  "terms": ["CONTROLLED UNCLASSIFIED"],   // OR: Array of term strings
  "regex": "(?i)\\bECCN\\s*\\d{1}[A-E]"  // OR: Custom regex pattern
}

Constraint: A rule has either terms or regex, never both.

7.2 Pattern Compilation

  • Terms: Auto-wrapped in word-boundary regex: (?<![A-Za-z0-9_])(?:term1|term2|...)(?![A-Za-z0-9_])
  • Regex: Compiled with RegexOptions.IgnoreCase | RegexOptions.Compiled and 5-second timeout
  • All patterns precompiled on load for performance

7.3 Matching Algorithm

PatternLibrary.FindMatches(text, enabledLabels):
  results = {}  // keyed by match position for deduplication

  for each enabled pattern:
    for each regex match in text:
      position = match.Index
      if results[position] exists AND existing.score >= this.score:
        skip  // higher-score pattern wins at same position
      else:
        results[position] = Finding{
          pattern: rule.label,
          type: rule.type,
          score: rule.score,
          matchedText: match.Value,
          snippet: extractContext(text, position, radius=130 chars)
        }

  return results.values
    .OrderByDescending(score)
    .Take(50)  // max 50 findings per file

7.4 Sensitivity Tiers (Cumulative)

Tier Selection Contents
Low Low only Formal markings only (CUI/CONTROLLED, ITAR headers, DFARS clause numbers)
Medium Low + Medium Adds ECCN codes, CTI indicators, CMMC/SPRS references, strong abbreviations
High Low + Medium + High Adds broad catch-all terms (SECRET, CONFIDENTIAL, generic export terms)

7.5 Scoring System

Finding-Level Severity

Score Level Description
0-7 Low Minimal confidence — possible false positive
8-12 Medium Moderate confidence — warrants review
13-17 High Strong indicator — likely genuine
18+ Critical Very high confidence — formal marking detected

File-Level Severity (aggregate score across all findings)

Score Level Description
0-12 Low Minor or incidental matches
13-24 Medium Multiple moderate findings
25-39 High Significant concentration of findings
40+ Critical Dense cluster of high-confidence findings

8. Persistence Layer

8.1 Database Schema

-- Core session metadata
CREATE TABLE Session (
  Id INTEGER PRIMARY KEY,
  RootPath TEXT,              -- Encrypted when cipher active
  Mode TEXT,                  -- Scan mode identifier
  StartedUtc TEXT,
  LastUpdatedUtc TEXT,
  AppVersion TEXT,
  EnabledRules TEXT,          -- JSON array of enabled rule labels
  FileExtensions TEXT,        -- JSON array of enabled file extensions
  CleanShutdown INTEGER,      -- 1=clean, 0=crash/force-close
  EncryptionSalt TEXT,        -- Base64-encoded 32-byte salt
  EncryptionVerifier TEXT,    -- Encrypted sentinel value
  OpenedFromPath TEXT         -- Source path if DB was opened from file
);

-- Scanned files with status
CREATE TABLE Files (
  FilePath TEXT PRIMARY KEY,  -- Full path to file
  Status INTEGER,             -- 0=Pending, 1=Scanned, 2=Error, 3=Skipped
  LastError TEXT,              -- Error message if Status=Error
  Checksum TEXT,              -- XXHash64 hex string
  ReviewStatus INTEGER,       -- 0=Pending, 1=Confirmed, 2=FalsePositive
  ReviewedBy TEXT,            -- Windows username who reviewed
  ReviewedUtc TEXT            -- ISO 8601 timestamp of review
);

-- Individual findings within files
CREATE TABLE Findings (
  Id INTEGER PRIMARY KEY AUTOINCREMENT,
  FilePath TEXT,              -- Foreign key to Files
  Pattern TEXT,               -- Rule label that matched
  Snippet TEXT,               -- Encrypted: context text around match
  Type TEXT,                  -- CUI, CTI, ITAR, DFARS, Heuristic
  Score INTEGER,              -- Rule score value
  Json TEXT                   -- Encrypted: full finding JSON
);

8.2 SQLite Configuration

Pragma Value Purpose
journal_mode WAL Write-ahead logging for concurrent reads + crash safety
synchronous FULL (normal), OFF (bulk seeding) Durability guarantee

8.3 Smart Rescan Logic

SeedFilesForRescan():
  existing = load all (FilePath → Checksum, Status) from Files table
  current = enumerate files on disk with checksums

  for each file on disk:
    if file in existing:
      if checksum unchanged AND status != Pending:
        skip (no rescan needed)
      else:
        mark as Pending (needs rescan)
    else:
      insert as new Pending file

  for each file in DB but not on disk:
    delete from Files + Findings

  return (totalFiles, skippedUnchanged)

8.4 Review Status Preservation

After rescan, review statuses are preserved when:

  1. The file's XXHash64 checksum is unchanged, AND
  2. The same patterns matched (same rule labels triggered)

If the file content changed or different rules were triggered, the review status resets to Pending.


9. Encryption System

9.1 Algorithm Specification

Parameter Value
Cipher AES-256-GCM
Key derivation PBKDF2-SHA256
Iterations 100,000
Key size 256 bits (32 bytes)
Salt size 256 bits (32 bytes, random)
Nonce 96 bits (12 bytes, random per encryption)
Auth tag 128 bits (16 bytes)
Compliance FIPS 140-2 (via Windows CNG/BCrypt)

9.2 Wire Format

Encrypted values are stored as:

"ENC:" + Base64(nonce[12] ‖ ciphertext[variable] ‖ tag[16])

The ENC: prefix distinguishes encrypted from plaintext values (for migration support).

9.3 Encrypted Fields

Table.Column When Encrypted
Session.RootPath Always (contains directory path)
Findings.Snippet Always (contains matched text context)
Findings.Json Always (contains full finding details)

9.4 Passphrase Verification

A sentinel value "CUICKSCAN_VERIFY_OK" is encrypted with the passphrase and stored in Session.EncryptionVerifier. On unlock, the sentinel is decrypted and compared using fixed-time comparison to prevent timing attacks.

9.5 Memory Protection

  • Key material stored in private byte[] _key
  • Dispose() zeros the key via CryptographicOperations.ZeroMemory()
  • Plaintext buffers zeroed after encryption/decryption
  • Passphrase cleared from React state immediately before async API call
  • Key never logged (CrashLog sanitizes "ENC:" values to "[ENCRYPTED]")

9.6 Encryption Upgrade

When setting a passphrase on an unencrypted database:

  1. Generate random 32-byte salt
  2. Derive key via PBKDF2
  3. Create verifier (encrypt sentinel)
  4. Encrypt Session.RootPath
  5. Encrypt all Findings (Snippet + Json) in 1000-item batches
  6. Store salt + verifier in Session table
  7. Progress reported via polling endpoint

9.7 Passphrase Change

When changing an existing passphrase:

  1. Verify old passphrase against stored verifier
  2. Derive new key from new passphrase + new salt
  3. In a single SQLite transaction (atomic):
    • Decrypt each field with old cipher
    • Re-encrypt with new cipher
    • Update salt + verifier
  4. If any step fails, transaction rolls back (old passphrase remains valid)

10. File Content Extraction

10.1 Supported Formats

Extension(s) Method Library Max Size
.txt, .csv, .log, .json, .xml, .md, .ini, .cfg, .yaml, .yml, .rtf, .dxf, .cs, .vb, .cpp, .h ReadText System.IO 10 MB
.docx ExtractDocx DocumentFormat.OpenXml Configurable (default 10 MB)
.xlsx ExtractXlsx DocumentFormat.OpenXml Configurable
.pptx ExtractPptx DocumentFormat.OpenXml Configurable
.pdf ExtractPdf UglyToad.PdfPig Configurable
.eml ExtractEml MimeKit Configurable
.xls ExtractXls NPOI (HSSF) Configurable
.doc, .ppt ExtractOle2Strings Binary heuristic Configurable
.vsdx ExtractZipXml ZipFile Configurable
.jpg, .jpeg, .png, .tif, .tiff ExtractImageMetadata System.Drawing.Image N/A
.dwg (and unknown) ExtractBinaryHeuristic Filename + binary strings N/A

10.2 Text Limits

Constant Value Purpose
MaxFileSizeBytes 10 MB Plain text file size cap
MaxStructuredFileSizeMB Configurable (1-500, default 10) DOCX/XLSX/PDF etc.
MaxExtractedTextLength 1,000,000 chars Cap text passed to regex engine

10.3 Safety Mechanisms

  • Zip bomb protection: Entries with compression ratio > 100:1 are skipped
  • OutOfMemoryException handling: Aggressive GC (Gen2 + wait finalizers), return empty
  • Per-file timeout: 30 seconds enforced by ScanService
  • All exceptions caught: No extraction error propagates; returns empty string

11. Report Generation

11.1 Report Structure

Generated via QuestPDF. Report sections:

  1. Cover Page

    • Application logo (embedded PNG)
    • Title: "CUI / Export-Controlled Data Scan Report"
    • Metadata: scan date, directory, file count, version
    • "CONTROLLED DOCUMENT" footer
  2. Executive Summary

    • Statistics table: files scanned, flagged, confirmed, pending review
    • Category distribution bar chart (proportional columns for CUI/CTI/ITAR/DFARS)
    • Severity distribution pills (Low/Medium/High/Critical with counts)
    • List of enabled detection rules
  3. File-Level Findings Table

    • Columns: #, File path, Category tags, Rules triggered, Score, Severity, Review status
    • Alternating row colors; false positives excluded
    • Color-coded severity and category tags
  4. Methodology & Disclaimer

    • Technical explanation of scan approach
    • Legal disclaimer about automated screening
    • Severity scale reference table

11.2 Report Colors

Element Color
CUI #4a90e2 (blue)
CTI #9b59b6 (purple)
ITAR #e74c3c (red)
DFARS #1abc9c (teal)
Heuristic #f5a623 (amber)
Critical severity #ef4444
High severity #f97316
Medium severity #eab308
Low severity #6b7280

12. Session Management

12.1 Session Lifecycle

┌───────────┐    Clear/New DB    ┌───────────┐
│  No       │ ←───────────────── │  Content  │
│  Session  │                    │  Session  │
└─────┬─────┘                    └─────┬─────┘
      │ Set passphrase +               │
      │ scan start                     │ Save DB
      ▼                                ▼
┌───────────┐    Scan complete   ┌───────────┐
│  Active   │ ─────────────────→ │  Complete │
│  Scan     │                    │  Session  │
└─────┬─────┘                    └───────────┘
      │ Cancel                         ▲
      ▼                                │ Open DB
┌───────────┐                    ┌─────┴─────┐
│  Canceled │                    │  Loaded   │
│  Session  │                    │  From File│
└───────────┘                    └───────────┘

12.2 Session Persistence

Enabled rules and file extensions are debounce-persisted (500ms) to the session table whenever they change. This ensures settings survive application crashes.

12.3 Database Portability

  • Save: Copies working scan-state.db to user-chosen location; records source path
  • Open: Copies external .db file to working location; records source path for overwrite
  • Overwrite vs Save As: When working with an opened DB, user can overwrite source or save to new file
  • Path Rebasing: When opening a DB from a different machine, file paths can be remapped from old root to new root

12.4 Clean Shutdown Tracking

  • CleanShutdown flag in Session table set to 0 when scan starts
  • Set to 1 when app closes normally via close-discard or save-then-close
  • On next launch, if CleanShutdown=0, Recovery modal is shown
  • WAL checkpoint attempted during recovery to salvage data

13. State Machine

Scan State Transitions

         ┌────────────────────────────────────┐
         │                                    │
         ▼                                    │
      ┌──────┐  start   ┌─────────┐  done  ┌──────────┐
      │ idle │ ────────→ │ running │ ──────→│ completed│
      └──┬───┘          └────┬────┘        └──────────┘
         │                   │ pause              ▲
         │                   ▼                    │
         │              ┌────────┐  resume        │
         │              │ paused │ ───────────────┘
         │              └───┬────┘                │
         │                  │ cancel              │
         │                  ▼                     │
         │              ┌──────────┐              │
         └──────────────│ canceled │──────────────┘
            start       └──────────┘   start

UI State → Available Actions

State Available Actions
idle Start Scan, Browse, Edit Rules, Edit File Types, Open DB, New Scan DB
running Pause, Stop, (rules/file types read-only)
paused Resume, Stop, Edit Rules, Edit File Types (shows "settings changed" warning)
completed Start Scan (rescan/fresh), Clear Results, Generate Report, Export, Copy, Move, Save DB
canceled Start Scan, Clear Results, Save DB

14. UI Component Hierarchy

<ThemeCtx.Provider>
  <SplashScreen />                    // Fixed overlay, fades out after init

  <div "main-container">
    <TitleBar />                      // 36px fixed height
    <ControlBar />                    // ~80px, scan config + actions
      └─ Rule Popover                // Dropdown with tier/category/individual toggles

    <div "content-split">
      <ResultsPanel (53%, 440-700px)>
        ├─ Filter Input
        ├─ Category + Score Dropdowns
        ├─ "Show False Positives" Checkbox
        └─ Result Cards (scrollable list)
            ├─ File name + reveal button
            ├─ Directory path
            ├─ Category tags + finding count + score badge
            ├─ File type icon + open button
            └─ Review buttons (Confirmed / False Positive)

      <DetailsPanel (flex remaining)>
        ├─ File Header (name, path, categories, review status)
        └─ Finding Cards (scrollable)
            ├─ Category tag + rule label + score badge
            ├─ "Triggered by" badge (matched text)
            └─ Highlighted snippet (with tabular mode for spreadsheets)
    </div>

    <StatusBar />                     // ~66px fixed height
      ├─ Action buttons row (Export, Copy, Move, Open/Save/New DB, Change Passphrase)
      └─ Progress row (bar, status text, encryption indicator, flagged count)
  </div>

  // Modal overlays (z-index layered)
  <PassphraseModal />                 // z:1100 — set/enter/upgrading passphrase
  <ChangePassphraseModal />           // z:1100 — change passphrase with progress
  <Modal "Resume" />                  // z:1000 — session recovery
  <Modal "Rebase" />                  // z:1000 — path remapping
  <Modal "SaveWarning" />             // z:1000 — save before destructive action
  <Modal "SaveChoice" />              // z:1000 — overwrite vs save-as
  <Modal "Close" />                   // z:1000 — close app confirmation
  <Modal "CopyMove" />               // z:1000 — copy/move confirmation
  <Modal "Upgrade" />                 // z:1000 — database schema upgrade
  <Modal "Rescan" />                  // z:1000 — rescan/fresh/new-ext choice
  <RuleEditor />                      // z:2000 — full-screen JSON editor
  <FileTypesEditor />                 // z:2000 — full-screen extension manager
  <ToastContainer />                  // z:9999 — notification stack (bottom-right)
</ThemeCtx.Provider>

15. Theming System

15.1 Theme Context

Themes are provided via React Context (ThemeCtx). Two built-in themes: dark and light. Default follows OS preference via prefers-color-scheme media query. Toggle via Ctrl+T or title bar button.

15.2 Color Tokens

Token Dark Light Purpose
primaryBg #13141a #f0f1f5 Main background
secondaryBg #1c1d25 #ffffff Cards, panels
tertiaryBg #24252e #f8f9fb Hover states, details
cardBg #2a2b35 #ffffff Result cards
border #33343e #dde0e6 All borders
textPrimary #e8eaed #1a1c24 Main text
textSecondary #8b8d97 #6b6e78 Secondary text
textMuted #5c5e68 #9a9da7 Tertiary text
accent #4a90e2 #0066cc Primary accent (blue)
success #4caf50 #2e7d32 Success states
warning #f5a623 #e6960b Warning states
danger #e74c3c #c0392b Error/danger states
tagCUI #4a90e2 #0066cc CUI category
tagCTI #9b59b6 #7b1fa2 CTI category
tagITAR #e74c3c #c0392b ITAR category
tagDFARS #1abc9c #00897b DFARS category
tagHeuristic #f5a623 #e6960b Heuristic category

15.3 Typography

Usage Font Fallback
Body text IBM Plex Sans -apple-system, sans-serif
Code/paths IBM Plex Mono Cascadia Code, monospace
Inputs IBM Plex Mono Cascadia Code, monospace

15.4 Interactive Feedback

  • Button hover: box-shadow: 0 0 8px 1px {color}44 (accent/danger/warning glow)
  • Card hover: Border color transitions to borderHover
  • Animations: fadeIn (0.15s), scaleIn (0.2s), slideIn (0.25s), splashFadeOut (0.45s)
  • False positive dismiss: 350ms opacity + height collapse animation
  • Scrollbar: Custom styled, 7px width, rounded thumb

16. Error Handling & Logging

16.1 CrashLog Service

  • Location: {AppDir}/Logs/cuickscan-YYYY-MM-DD.log
  • Rotation: Daily files + size-based (10 MB max per file, up to 3 rotated)
  • Retention: 14 days automatic cleanup
  • Thread safety: All writes locked via lock (_writeLock)
  • Log levels: DEBUG (only with --debug), INFO, WARN, ERROR, FATAL
  • Security: Encrypted values sanitized to [ENCRYPTED] in log output
  • Resilience: Logger itself never throws exceptions

16.2 Frontend Error Handling

  • Global error handler: Catches uncaught JS errors and unhandled promise rejections
  • API wrapper: All api.get()/api.post() calls throw on non-200 responses
  • Debug logging: When --debug active, frontend logs sent to backend via POST /api/app/debug-log
  • Toast notifications: User-visible errors displayed as red toast (bottom-right, 3.5s duration)

16.3 Log Format

[HH:mm:ss.fff] [LEVEL] [T{ThreadId}] Message
  Exception: ExceptionType: Message
  Stack: (full stack trace)
  Inner: InnerExceptionType: Message

17. Security Controls

17.1 Attack Surface

Surface Mitigation
Localhost web server Bound to localhost only; not exposed to network
SQL injection All queries use parameterized SQL via Microsoft.Data.Sqlite
File path injection Paths validated before shell execution (explorer, open-file)
CSV formula injection Export escapes =, +, -, @ prefix characters
Regex denial of service 5-second timeout per pattern; max 50 findings per file
Zip bombs 100:1 compression ratio check before extraction
Memory exhaustion Per-file text cap (1MB), per-file timeout (30s), aggressive GC
Credential exposure Passphrase never stored; cleared from React state pre-API call
Log leakage Encrypted values sanitized in logs; no finding content logged

17.2 Encryption Guarantees

  • AES-256-GCM with authenticated encryption (integrity + confidentiality)
  • PBKDF2-SHA256 with 100,000 iterations (brute-force resistance)
  • Random nonce per encryption operation (no nonce reuse)
  • Key material zeroed on disposal
  • No backdoor — passphrase loss means permanent data loss

17.3 Request Identification

All API requests include X-CUIckScan: 1 header. The server can use this to distinguish application requests from potential unauthorized access, though the primary security boundary is localhost binding.


18. Configuration Files

18.1 rules.json

Location: {AppDir}/rules.json

Contains detection patterns organized by category and sensitivity tier. Each rule requires:

  • label (string, unique) — Human-readable name
  • type (enum) — CUI | CTI | ITAR | DFARS | Heuristic
  • score (integer, 0-100) — Confidence weight
  • Either terms (string array) or regex (string) — Detection pattern
  • Optional tier (enum) — low | medium | high (default: medium)
  • Optional appliesTo (string array) — Cross-category applicability

Editable via in-app Rule Editor with syntax verification and revert-to-defaults.

18.2 filetypes.json

Location: {AppDir}/filetypes.json

Contains file extension list with enabled/disabled state:

{
  "extensions": [
    {"ext": ".pdf", "enabled": true},
    {"ext": ".docx", "enabled": true},
    {"ext": ".custom", "enabled": true, "custom": true}
  ]
}

28 built-in extensions. User can add custom extensions. Disabling an extension that was part of the current scan session triggers a warning about affected results being removed.

18.3 release_version.txt

Single-line file containing the current version string (e.g., 1.4.27). Read by the backend and served to the UI via /api/app/version. Incremented with every change.


Appendix A: Default Detection Rules Summary

Low Tier (Formal Markings)

Rule Type Score Key Patterns
CUI formal markings CUI 15 "CONTROLLED UNCLASSIFIED INFORMATION", "CUI//SP-"
Classification markings CUI 18 "TOP SECRET", "SECRET//NOFORN", "CLASSIFIED"
ITAR/export-control formal ITAR 15 "INTERNATIONAL TRAFFIC IN ARMS", "ITAR CONTROLLED"
DFARS clause references DFARS 15 "DFARS 252.204-7012", "DFARS 252.204-7020"

Medium Tier (Strong Indicators)

Rule Type Score Key Patterns
ECCN code pattern ITAR 12 ECCN followed by alphanumeric codes
ITAR abbreviations ITAR 10 "USML", "DDTC", "EAR99", "AECA"
CTI indicators CTI 10 "INDICATOR OF COMPROMISE", "THREAT INTELLIGENCE"
CMMC/SPRS references DFARS 8 "CMMC LEVEL", "SPRS SCORE", "SSP"

High Tier (Broad Catch-All)

Rule Type Score Key Patterns
Short abbreviations CUI 5 "SBU", "OUO", "FOUO", "LES"
Ambiguous markers CUI 4 "SECRET", "CONFIDENTIAL" (prone to false positives)
Generic export terms ITAR 6 "EXPORT CONTROLLED", "RESTRICTED DATA"

Appendix B: Function Index (App.jsx)

Top-Level Functions

Function Line Purpose
setsEqual(a, b) 4-8 Compare two Sets for equality
api.get(url) 11-15 HTTP GET with error handling
api.post(url, body) 16-25 HTTP POST with error handling
debugLog(msg, level) 30-34 Frontend debug logging to backend
nativeWindow.* 55-63 WebView2 postMessage window operations
nativeDialogs.* 66-73 Native Windows dialog API wrappers
getFileSeverity(score) 121-126 Map aggregate score to severity level
getFindingSeverity(score) 129-134 Map finding score to severity level

CUIckScanApp Handler Functions

Handler Line Purpose
toast(message, color, onClick) 1931-1939 Show notification toast
fetchResults() 1949-1957 Load all results from API
handleReview(filePath, status) 1959-1971 Set file review status
reloadRules() 1974-1996 Reload rules after editing
handleBrowse() 2176-2181 Open folder browser dialog
doStartScan(options) 2196-2244 Execute scan with given options
handleStartScan() 2246-2297 Determine scan type and start
handlePause() 2299-2301 Pause running scan
handleResume() 2555-2557 Resume paused scan
handleCancel() 2559-2561 Cancel running/paused scan
handlePassphraseConfirmed(pass) 2396-2493 Process passphrase entry/setup
handleChangePassConfirmed(cur,new) 2508-2549 Process passphrase change
handleLoadSession() 2366-2393 Begin session load flow
finishLoadSession() 2315-2342 Complete session load (after passphrase)
continueAfterUpgrade() 2345-2364 Continue load flow after DB upgrade
handleCloseApp() 2574-2587 Initiate close flow
handleExport() 2827-2833 Export file paths to CSV
handleGenerateReport() 2835-2848 Generate PDF report
handleCopy() / handleMove() 2850-2875 Copy/move flagged files
performOpenDb() 2651-2705 Open external scan database
handleSaveDb() 2896-2914 Save scan database
handleNewScanDb() 2799-2815 Create new scan database
handleClearResults() 2817-2825 Clear scan results
executeDestructiveAction(action) 2708-2766 Execute clear/new/open after save check
handleUpgradeDb(backup) 2946-2965 Upgrade database schema

End of specification.