Skip to content

D13ya/DaZZLeD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

73 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DaZZLeD: Privacy-Preserving Content Detection

A Clean-Room Implementation of Apple's PSI Protocol with Post-Quantum Improvements

Go Python PyTorch ONNX


🎯 Overview

DaZZLeD is a research project that reconstructs and improves upon the Apple CSAM Detection Protocol.

It implements the core "Sandwich" privacy architecture:

  1. Client-Side AI: Perceptual hashing to detect content.
  2. Blind PSI: Checking hashes against a server without revealing user data.

Our Improvements

We address specific weaknesses in the original design using modern techniques:

Feature Apple NeuralHash (Original) DaZZLeD (This Project)
Hash Robustness Vulnerable to collision attacks ResNetHashNet: Contrastive learning + adversarial training for robust per-image hashing.
Cryptography Elliptic Curve PSI (Pre-Quantum) Lattice PSI: Post-Quantum ML-DSA + OPRF (Module-Lattices).
Auditability Opaque Database Signed Commitment: The server signs the database state (Bloom Filter) to prevent split-view attacks.
Runtime iOS CoreML only Cross-platform ONNX Runtime (Windows/Linux/Mac).

⚠️ Note on Database: This is a clean-room implementation. We do NOT possess or distribute real CSAM hashes. The system is designed to verify the protocol, and users must ingest their own dummy hashes for testing.


πŸ— Architecture

graph LR
    subgraph Client [Client Device]
        direction TB
        Img(Image) --> Model(HashNet)
        Model --> Hash(128-bit Hash) --> Lat(Lattice R_q)
        Lat --> Blind(Blinded P')
    end

    Blind -->|gRPC Req| Srv

    subgraph Server [Authority Node]
        direction TB
        Srv(Server Process)
        Srv -->|Sign| Sig(Blinded Sig S')
        Srv -->|Proof| Proof(Commitment)
    end

    Sig & Proof -->|gRPC Resp| Verif

    subgraph Check [Verification]
        direction TB
        Verif{Valid DB?}
        Verif -->|Yes| Unblind(Unblind Sig) --> Match{Match?}
    end

    %% Styling for a polished look
    classDef plain fill:#fff,stroke:#333,stroke-width:1px;
    classDef node fill:#ececff,stroke:#555,stroke-width:1px;
    
    class Img,Model,Hash,Lat,Blind,Srv,Sig,Proof,Unblind plain
    class Verif,Match node
    
    style Client fill:#ffeefc,stroke:#d470a2
    style Server fill:#e6f0ff,stroke:#4d88ff
    style Check fill:#e8ffe8,stroke:#4caf50
Loading

πŸš€ Quick Start

Prerequisites

  • Go 1.24+
  • Python 3.10+ (for training)
  • ONNX Runtime (for inference)

1. Clone & Build

git clone https://github.com/D13ya/DaZZLeD.git
cd DaZZLeD
go mod tidy
go build ./...

2. Test the Hash Generator

# Build the hash test tool
go build -o hashtest.exe ./cmd/hashtest

# Hash an image
./hashtest.exe path/to/image.jpg

Output:

Image: test.jpg
Hash (first 10 floats): [0.1234 0.8765 0.3456 ...]
Binary hash (hex): a1b2c3d4e5f6789012345678...
Binary hash (bits): 10100001101100101100...

🧠 ML Core: HashNet Training

The perceptual hasher is a ResNet50-based contrastive model trained to produce unique 128-bit hashes for each image.

Training Features

Component Description
Backbone ResNet50 (ImageNet pretrained)
Hash Dim 128 bits
Losses NT-Xent contrastive + DHD + Quantization
Augmentations Random crop, flip, color jitter, blur
Memory Optimization Gradient checkpointing (~50% savings)
Training Time ~2 hours on T4 GPU (55k images)

Train Your Own Model

# Google Colab (T4 GPU recommended)
!python training/train_hashnet.py \
  --data-list /path/to/manifest.txt \
  --backbone resnet50 \
  --epochs 10 \
  --batch-size 256 \
  --grad-checkpoint \
  --label-mode none \
  --hash-contrastive-weight 1.0 \
  --dhd-weight 0.5 \
  --quant-weight 0.1 \
  --counterfactual-mode aug \
  --lr 5e-4 \
  --amp

Export to ONNX

import torch
from training.train_hashnet import ResNetHashNet
import safetensors.torch

model = ResNetHashNet("resnet50", hash_dim=128, proj_dim=512, pretrained=False)
safetensors.torch.load_model(model, "student_final.safetensors")
model.eval()

torch.onnx.export(
    model,
    torch.randn(1, 3, 224, 224),
    "hashnet.onnx",
    input_names=["image"],
    output_names=["hash"],
    dynamic_axes={"image": {0: "batch"}, "hash": {0: "batch"}},
    opset_version=14
)

βš™οΈ Go Integration

ONNX Runtime Setup

  1. Download ONNX Runtime from GitHub Releases
  2. Place files in configs/models/:
    • hashnet.onnx (your trained model)
    • hashnet.onnx.data (model weights)
    • onnxruntime.dll (runtime library)

Using the Hasher API

package main

import (
    "fmt"
    "github.com/D13ya/DaZZLeD/internal/bridge"
)

func main() {
    // Initialize ONNX Runtime
    bridge.InitONNXEnvironment("configs/models/onnxruntime.dll")
    defer bridge.DestroyONNXEnvironment()

    // Create hasher
    cfg := bridge.HasherConfig{
        ModelPath: "configs/models/hashnet.onnx",
        ImageSize: 224,
        HashDim:   128,
    }
    hasher, _ := bridge.NewONNXHasher(cfg)
    defer hasher.Close()

    // Hash an image
    imgBytes, _ := bridge.LoadImage("photo.jpg")
    hash, _ := hasher.Hash(imgBytes)

    // Binarize for comparison
    binaryHash := bridge.BinarizeHashToBytes(hash)
    fmt.Printf("Hash: %x\n", binaryHash)

    // Compare two images
    hash2, _ := hasher.Hash(otherImageBytes)
    distance := bridge.HammingDistance(
        bridge.BinarizeHashToBytes(hash),
        bridge.BinarizeHashToBytes(hash2),
    )
    fmt.Printf("Hamming distance: %d bits\n", distance)
}

πŸ” Crypto Core: Post-Quantum Security

Lattice-Based OPRF

The hash is mapped to a lattice ring element before cryptographic operations:

// Map float hash to lattice point
latticePoint := bridge.MapToLattice(hashVec)

// Blind for OPRF
state, blindedRequest := oprfClient.Blind(latticePoint.Marshal())

// Server signs blindly (doesn't see the hash)
// Client unblinds to verify membership

ML-DSA Signatures

All proofs are signed with ML-DSA (Dilithium), a post-quantum digital signature algorithm.


πŸ“Š Performance

Metric Value
Hash generation ~15ms (GPU) / ~100ms (CPU)
Model size (ONNX) ~95 MB
Hash size 128 bits (16 bytes)
Collision resistance 2^64 (birthday bound)

πŸ“ Project Structure

DaZZLeD/
β”œβ”€β”€ cmd/
β”‚   β”œβ”€β”€ client/          # Client binary
β”‚   β”œβ”€β”€ server/          # Server binary
β”‚   β”œβ”€β”€ hashtest/        # Hash testing tool
β”‚   └── setup/           # Key generation
β”œβ”€β”€ configs/
β”‚   └── models/          # ONNX model + runtime
β”œβ”€β”€ internal/
β”‚   β”œβ”€β”€ bridge/          # ONNX Runtime wrapper
β”‚   β”‚   β”œβ”€β”€ onnx_runtime.go
β”‚   β”‚   └── lsq.go       # Lattice quantization
β”‚   β”œβ”€β”€ crypto/          # Post-quantum crypto
β”‚   └── app/             # Client/server logic
β”œβ”€β”€ ml-core/
β”‚   β”œβ”€β”€ training/
β”‚   β”‚   └── train_hashnet.py  # HashNet training
β”‚   └── notebooks/       # Colab notebooks
└── api/
    └── proto/           # gRPC definitions

πŸ”¬ Research Background

This project implements concepts from:

  1. Black-box Collision Attacks on NeuralHash - Why we need adversarial robustness
  2. Split Accumulation for Relations - Our ZK verification approach
  3. Contrastive Learning for Perceptual Hashes - NT-Xent loss for per-image discrimination

⚠️ Legal & Ethical Notice

Research Only: This is an educational implementation for studying privacy-preserving AI systems.

  • No real illegal content is used for training or testing
  • All datasets are public and non-sensitive (FFHQ, OpenImages)
  • This is a clean-room implementation based on public papers

πŸ“œ License

MIT License - See LICENSE for details.


🀝 Contributing

Contributions welcome! Please see:

  1. Open an issue to discuss changes
  2. Fork and create a PR
  3. Ensure tests pass: go test ./...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors