DaZZLeD: Privacy-Preserving Content Detection

A Clean-Room Implementation of Apple's PSI Protocol with Post-Quantum Improvements

🎯 Overview

DaZZLeD is a research project that reconstructs and improves upon the Apple CSAM Detection Protocol.

It implements the core "Sandwich" privacy architecture:

Client-Side AI: Perceptual hashing to detect content.
Blind PSI: Checking hashes against a server without revealing user data.

Our Improvements

We address specific weaknesses in the original design using modern techniques:

Feature	Apple NeuralHash (Original)	DaZZLeD (This Project)
Hash Robustness	Vulnerable to collision attacks	ResNetHashNet: Contrastive learning + adversarial training for robust per-image hashing.
Cryptography	Elliptic Curve PSI (Pre-Quantum)	Lattice PSI: Post-Quantum ML-DSA + OPRF (Module-Lattices).
Auditability	Opaque Database	Signed Commitment: The server signs the database state (Bloom Filter) to prevent split-view attacks.
Runtime	iOS CoreML only	Cross-platform ONNX Runtime (Windows/Linux/Mac).

⚠️ Note on Database: This is a clean-room implementation. We do NOT possess or distribute real CSAM hashes. The system is designed to verify the protocol, and users must ingest their own dummy hashes for testing.

🏗 Architecture

graph LR
    subgraph Client [Client Device]
        direction TB
        Img(Image) --> Model(HashNet)
        Model --> Hash(128-bit Hash) --> Lat(Lattice R_q)
        Lat --> Blind(Blinded P')
    end

    Blind -->|gRPC Req| Srv

    subgraph Server [Authority Node]
        direction TB
        Srv(Server Process)
        Srv -->|Sign| Sig(Blinded Sig S')
        Srv -->|Proof| Proof(Commitment)
    end

    Sig & Proof -->|gRPC Resp| Verif

    subgraph Check [Verification]
        direction TB
        Verif{Valid DB?}
        Verif -->|Yes| Unblind(Unblind Sig) --> Match{Match?}
    end

    %% Styling for a polished look
    classDef plain fill:#fff,stroke:#333,stroke-width:1px;
    classDef node fill:#ececff,stroke:#555,stroke-width:1px;
    
    class Img,Model,Hash,Lat,Blind,Srv,Sig,Proof,Unblind plain
    class Verif,Match node
    
    style Client fill:#ffeefc,stroke:#d470a2
    style Server fill:#e6f0ff,stroke:#4d88ff
    style Check fill:#e8ffe8,stroke:#4caf50

🚀 Quick Start

Prerequisites

Go 1.24+
Python 3.10+ (for training)
ONNX Runtime (for inference)

1. Clone & Build

git clone https://github.com/D13ya/DaZZLeD.git
cd DaZZLeD
go mod tidy
go build ./...

2. Test the Hash Generator

# Build the hash test tool
go build -o hashtest.exe ./cmd/hashtest

# Hash an image
./hashtest.exe path/to/image.jpg

Output:

Image: test.jpg
Hash (first 10 floats): [0.1234 0.8765 0.3456 ...]
Binary hash (hex): a1b2c3d4e5f6789012345678...
Binary hash (bits): 10100001101100101100...

🧠 ML Core: HashNet Training

The perceptual hasher is a ResNet50-based contrastive model trained to produce unique 128-bit hashes for each image.

Training Features

Component	Description
Backbone	ResNet50 (ImageNet pretrained)
Hash Dim	128 bits
Losses	NT-Xent contrastive + DHD + Quantization
Augmentations	Random crop, flip, color jitter, blur
Memory Optimization	Gradient checkpointing (~50% savings)
Training Time	~2 hours on T4 GPU (55k images)

Train Your Own Model

# Google Colab (T4 GPU recommended)
!python training/train_hashnet.py \
  --data-list /path/to/manifest.txt \
  --backbone resnet50 \
  --epochs 10 \
  --batch-size 256 \
  --grad-checkpoint \
  --label-mode none \
  --hash-contrastive-weight 1.0 \
  --dhd-weight 0.5 \
  --quant-weight 0.1 \
  --counterfactual-mode aug \
  --lr 5e-4 \
  --amp

Export to ONNX

import torch
from training.train_hashnet import ResNetHashNet
import safetensors.torch

model = ResNetHashNet("resnet50", hash_dim=128, proj_dim=512, pretrained=False)
safetensors.torch.load_model(model, "student_final.safetensors")
model.eval()

torch.onnx.export(
    model,
    torch.randn(1, 3, 224, 224),
    "hashnet.onnx",
    input_names=["image"],
    output_names=["hash"],
    dynamic_axes={"image": {0: "batch"}, "hash": {0: "batch"}},
    opset_version=14
)

⚙️ Go Integration

ONNX Runtime Setup

Download ONNX Runtime from GitHub Releases
Place files in configs/models/:
- hashnet.onnx (your trained model)
- hashnet.onnx.data (model weights)
- onnxruntime.dll (runtime library)

Using the Hasher API

package main

import (
    "fmt"
    "github.com/D13ya/DaZZLeD/internal/bridge"
)

func main() {
    // Initialize ONNX Runtime
    bridge.InitONNXEnvironment("configs/models/onnxruntime.dll")
    defer bridge.DestroyONNXEnvironment()

    // Create hasher
    cfg := bridge.HasherConfig{
        ModelPath: "configs/models/hashnet.onnx",
        ImageSize: 224,
        HashDim:   128,
    }
    hasher, _ := bridge.NewONNXHasher(cfg)
    defer hasher.Close()

    // Hash an image
    imgBytes, _ := bridge.LoadImage("photo.jpg")
    hash, _ := hasher.Hash(imgBytes)

    // Binarize for comparison
    binaryHash := bridge.BinarizeHashToBytes(hash)
    fmt.Printf("Hash: %x\n", binaryHash)

    // Compare two images
    hash2, _ := hasher.Hash(otherImageBytes)
    distance := bridge.HammingDistance(
        bridge.BinarizeHashToBytes(hash),
        bridge.BinarizeHashToBytes(hash2),
    )
    fmt.Printf("Hamming distance: %d bits\n", distance)
}

🔐 Crypto Core: Post-Quantum Security

Lattice-Based OPRF

The hash is mapped to a lattice ring element before cryptographic operations:

// Map float hash to lattice point
latticePoint := bridge.MapToLattice(hashVec)

// Blind for OPRF
state, blindedRequest := oprfClient.Blind(latticePoint.Marshal())

// Server signs blindly (doesn't see the hash)
// Client unblinds to verify membership

ML-DSA Signatures

All proofs are signed with ML-DSA (Dilithium), a post-quantum digital signature algorithm.

📊 Performance

Metric	Value
Hash generation	~15ms (GPU) / ~100ms (CPU)
Model size (ONNX)	~95 MB
Hash size	128 bits (16 bytes)
Collision resistance	2^64 (birthday bound)

📁 Project Structure

DaZZLeD/
├── cmd/
│   ├── client/          # Client binary
│   ├── server/          # Server binary
│   ├── hashtest/        # Hash testing tool
│   └── setup/           # Key generation
├── configs/
│   └── models/          # ONNX model + runtime
├── internal/
│   ├── bridge/          # ONNX Runtime wrapper
│   │   ├── onnx_runtime.go
│   │   └── lsq.go       # Lattice quantization
│   ├── crypto/          # Post-quantum crypto
│   └── app/             # Client/server logic
├── ml-core/
│   ├── training/
│   │   └── train_hashnet.py  # HashNet training
│   └── notebooks/       # Colab notebooks
└── api/
    └── proto/           # gRPC definitions

🔬 Research Background

This project implements concepts from:

Black-box Collision Attacks on NeuralHash - Why we need adversarial robustness
Split Accumulation for Relations - Our ZK verification approach
Contrastive Learning for Perceptual Hashes - NT-Xent loss for per-image discrimination

⚠️ Legal & Ethical Notice

Research Only: This is an educational implementation for studying privacy-preserving AI systems.

No real illegal content is used for training or testing
All datasets are public and non-sensitive (FFHQ, OpenImages)
This is a clean-room implementation based on public papers

📜 License

MIT License - See LICENSE for details.

🤝 Contributing

Contributions welcome! Please see:

Open an issue to discuss changes
Fork and create a PR
Ensure tests pass: go test ./...

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
api/proto/v1		api/proto/v1
cmd		cmd
configs		configs
internal		internal
ml-core		ml-core
pkg		pkg
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
sonarqube.json		sonarqube.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DaZZLeD: Privacy-Preserving Content Detection

A Clean-Room Implementation of Apple's PSI Protocol with Post-Quantum Improvements

🎯 Overview

Our Improvements

🏗 Architecture

🚀 Quick Start

Prerequisites

1. Clone & Build

2. Test the Hash Generator

🧠 ML Core: HashNet Training

Training Features

Train Your Own Model

Export to ONNX

⚙️ Go Integration

ONNX Runtime Setup

Using the Hasher API

🔐 Crypto Core: Post-Quantum Security

Lattice-Based OPRF

ML-DSA Signatures

📊 Performance

📁 Project Structure

🔬 Research Background

⚠️ Legal & Ethical Notice

📜 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DaZZLeD: Privacy-Preserving Content Detection

A Clean-Room Implementation of Apple's PSI Protocol with Post-Quantum Improvements

🎯 Overview

Our Improvements

🏗 Architecture

🚀 Quick Start

Prerequisites

1. Clone & Build

2. Test the Hash Generator

🧠 ML Core: HashNet Training

Training Features

Train Your Own Model

Export to ONNX

⚙️ Go Integration

ONNX Runtime Setup

Using the Hasher API

🔐 Crypto Core: Post-Quantum Security

Lattice-Based OPRF

ML-DSA Signatures

📊 Performance

📁 Project Structure

🔬 Research Background

⚠️ Legal & Ethical Notice

📜 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages