DaZZLeD is a research project that reconstructs and improves upon the Apple CSAM Detection Protocol.
It implements the core "Sandwich" privacy architecture:
- Client-Side AI: Perceptual hashing to detect content.
- Blind PSI: Checking hashes against a server without revealing user data.
We address specific weaknesses in the original design using modern techniques:
| Feature | Apple NeuralHash (Original) | DaZZLeD (This Project) |
|---|---|---|
| Hash Robustness | Vulnerable to collision attacks | ResNetHashNet: Contrastive learning + adversarial training for robust per-image hashing. |
| Cryptography | Elliptic Curve PSI (Pre-Quantum) | Lattice PSI: Post-Quantum ML-DSA + OPRF (Module-Lattices). |
| Auditability | Opaque Database | Signed Commitment: The server signs the database state (Bloom Filter) to prevent split-view attacks. |
| Runtime | iOS CoreML only | Cross-platform ONNX Runtime (Windows/Linux/Mac). |
β οΈ Note on Database: This is a clean-room implementation. We do NOT possess or distribute real CSAM hashes. The system is designed to verify the protocol, and users must ingest their own dummy hashes for testing.
graph LR
subgraph Client [Client Device]
direction TB
Img(Image) --> Model(HashNet)
Model --> Hash(128-bit Hash) --> Lat(Lattice R_q)
Lat --> Blind(Blinded P')
end
Blind -->|gRPC Req| Srv
subgraph Server [Authority Node]
direction TB
Srv(Server Process)
Srv -->|Sign| Sig(Blinded Sig S')
Srv -->|Proof| Proof(Commitment)
end
Sig & Proof -->|gRPC Resp| Verif
subgraph Check [Verification]
direction TB
Verif{Valid DB?}
Verif -->|Yes| Unblind(Unblind Sig) --> Match{Match?}
end
%% Styling for a polished look
classDef plain fill:#fff,stroke:#333,stroke-width:1px;
classDef node fill:#ececff,stroke:#555,stroke-width:1px;
class Img,Model,Hash,Lat,Blind,Srv,Sig,Proof,Unblind plain
class Verif,Match node
style Client fill:#ffeefc,stroke:#d470a2
style Server fill:#e6f0ff,stroke:#4d88ff
style Check fill:#e8ffe8,stroke:#4caf50
- Go 1.24+
- Python 3.10+ (for training)
- ONNX Runtime (for inference)
git clone https://github.com/D13ya/DaZZLeD.git
cd DaZZLeD
go mod tidy
go build ./...# Build the hash test tool
go build -o hashtest.exe ./cmd/hashtest
# Hash an image
./hashtest.exe path/to/image.jpgOutput:
Image: test.jpg
Hash (first 10 floats): [0.1234 0.8765 0.3456 ...]
Binary hash (hex): a1b2c3d4e5f6789012345678...
Binary hash (bits): 10100001101100101100...
The perceptual hasher is a ResNet50-based contrastive model trained to produce unique 128-bit hashes for each image.
| Component | Description |
|---|---|
| Backbone | ResNet50 (ImageNet pretrained) |
| Hash Dim | 128 bits |
| Losses | NT-Xent contrastive + DHD + Quantization |
| Augmentations | Random crop, flip, color jitter, blur |
| Memory Optimization | Gradient checkpointing (~50% savings) |
| Training Time | ~2 hours on T4 GPU (55k images) |
# Google Colab (T4 GPU recommended)
!python training/train_hashnet.py \
--data-list /path/to/manifest.txt \
--backbone resnet50 \
--epochs 10 \
--batch-size 256 \
--grad-checkpoint \
--label-mode none \
--hash-contrastive-weight 1.0 \
--dhd-weight 0.5 \
--quant-weight 0.1 \
--counterfactual-mode aug \
--lr 5e-4 \
--ampimport torch
from training.train_hashnet import ResNetHashNet
import safetensors.torch
model = ResNetHashNet("resnet50", hash_dim=128, proj_dim=512, pretrained=False)
safetensors.torch.load_model(model, "student_final.safetensors")
model.eval()
torch.onnx.export(
model,
torch.randn(1, 3, 224, 224),
"hashnet.onnx",
input_names=["image"],
output_names=["hash"],
dynamic_axes={"image": {0: "batch"}, "hash": {0: "batch"}},
opset_version=14
)- Download ONNX Runtime from GitHub Releases
- Place files in
configs/models/:hashnet.onnx(your trained model)hashnet.onnx.data(model weights)onnxruntime.dll(runtime library)
package main
import (
"fmt"
"github.com/D13ya/DaZZLeD/internal/bridge"
)
func main() {
// Initialize ONNX Runtime
bridge.InitONNXEnvironment("configs/models/onnxruntime.dll")
defer bridge.DestroyONNXEnvironment()
// Create hasher
cfg := bridge.HasherConfig{
ModelPath: "configs/models/hashnet.onnx",
ImageSize: 224,
HashDim: 128,
}
hasher, _ := bridge.NewONNXHasher(cfg)
defer hasher.Close()
// Hash an image
imgBytes, _ := bridge.LoadImage("photo.jpg")
hash, _ := hasher.Hash(imgBytes)
// Binarize for comparison
binaryHash := bridge.BinarizeHashToBytes(hash)
fmt.Printf("Hash: %x\n", binaryHash)
// Compare two images
hash2, _ := hasher.Hash(otherImageBytes)
distance := bridge.HammingDistance(
bridge.BinarizeHashToBytes(hash),
bridge.BinarizeHashToBytes(hash2),
)
fmt.Printf("Hamming distance: %d bits\n", distance)
}The hash is mapped to a lattice ring element before cryptographic operations:
// Map float hash to lattice point
latticePoint := bridge.MapToLattice(hashVec)
// Blind for OPRF
state, blindedRequest := oprfClient.Blind(latticePoint.Marshal())
// Server signs blindly (doesn't see the hash)
// Client unblinds to verify membershipAll proofs are signed with ML-DSA (Dilithium), a post-quantum digital signature algorithm.
| Metric | Value |
|---|---|
| Hash generation | ~15ms (GPU) / ~100ms (CPU) |
| Model size (ONNX) | ~95 MB |
| Hash size | 128 bits (16 bytes) |
| Collision resistance | 2^64 (birthday bound) |
DaZZLeD/
βββ cmd/
β βββ client/ # Client binary
β βββ server/ # Server binary
β βββ hashtest/ # Hash testing tool
β βββ setup/ # Key generation
βββ configs/
β βββ models/ # ONNX model + runtime
βββ internal/
β βββ bridge/ # ONNX Runtime wrapper
β β βββ onnx_runtime.go
β β βββ lsq.go # Lattice quantization
β βββ crypto/ # Post-quantum crypto
β βββ app/ # Client/server logic
βββ ml-core/
β βββ training/
β β βββ train_hashnet.py # HashNet training
β βββ notebooks/ # Colab notebooks
βββ api/
βββ proto/ # gRPC definitions
This project implements concepts from:
- Black-box Collision Attacks on NeuralHash - Why we need adversarial robustness
- Split Accumulation for Relations - Our ZK verification approach
- Contrastive Learning for Perceptual Hashes - NT-Xent loss for per-image discrimination
Research Only: This is an educational implementation for studying privacy-preserving AI systems.
- No real illegal content is used for training or testing
- All datasets are public and non-sensitive (FFHQ, OpenImages)
- This is a clean-room implementation based on public papers
MIT License - See LICENSE for details.
Contributions welcome! Please see:
- Open an issue to discuss changes
- Fork and create a PR
- Ensure tests pass:
go test ./...