A standalone microservice for image similarity search and deduplication, powered by Milvus (Vector Database) and SSCD (Self-Supervised Copy Detection).
- Docker & Docker Compose
- GPU (Optional, but recommended for speed)
-
Download the Model: The service requires the SSCD model weights. Download them to the
models/directory:mkdir -p models wget -O models/sscd_disc_mixup.torchscript.pt https://dl.fbaipublicfiles.com/sscd-copy-detection/sscd_disc_mixup.torchscript.pt
-
Configure Environment: Copy the example environment file and customize it:
cp .env.example .env # Edit .env to set your pathsKey environment variables:
Variable Default Description WORKSPACE_PATH../../cbir-workspacePath to shared image workspace DOCKER_VOLUME_DIRECTORY.Base path for persistent data MODEL_DEVICEcpuDevice for inference ( cpuorcuda)CBIR_PORT8001CBIR service port -
Start the Service:
docker-compose up -d
This starts:
- etcd - Distributed KV store for Milvus
- MinIO - Object storage for Milvus (port
9000, console9001) - Milvus Standalone - Vector database (port
19530) - CBIR API - Image search service (port
8001) - Attu - Milvus admin UI (port
3322)
-
Verify:
# Check all services are running docker-compose ps # Check CBIR health curl http://localhost:8001/health
Visit
http://localhost:8001/docsto see the API documentation. -
Visualization: Access the Attu interface at
http://localhost:3322to visualize and manage the Milvus database.
# Start all services
docker-compose up -d
# Stop all services
docker-compose down
# View logs
docker-compose logs -f cbir-service
# Restart CBIR service after code changes
docker restart cbir-service
# Remove all data and start fresh
docker-compose down -vThe service exposes a REST API for indexing and searching images.
Add an image to the database. The image_path must be accessible to the container (e.g., via a shared volume).
POST /index
{
"user_id": "user_123",
"image_path": "/workspace/data/image1.jpg",
"labels": ["Western Blot", "Microscopy"]
}The labels field is optional and allows you to tag images with class labels for filtered retrieval.
Find images similar to a query image. Results are strictly isolated by user_id.
POST /search
{
"user_id": "user_123",
"image_path": "/workspace/data/query.jpg",
"top_k": 10,
"labels": ["Western Blot"]
}The labels field is optional. When provided, only images with any of the specified labels will be returned (OR logic). When omitted, all images are considered.
Upload an image directly to search for similar images.
POST /search/upload
Query parameters:
user_id(required): User ID for isolationtop_k(optional): Number of results (default: 10)labels(optional): Filter by labels (can be repeated for multiple labels)
Example:
curl -X POST "http://localhost:8001/search/upload?user_id=user_123&top_k=5&labels=Western%20Blot&labels=Microscopy" \
-F "file=@query_image.jpg"Remove an image vector from the index.
POST /delete
{
"user_id": "user_123",
"image_path": "/workspace/data/image1.jpg"
}Scientific images can have different classes such as Western Blots, Fluorescent Microscopy, X-Ray, Graphs, etc. The CBIR system supports label-based filtering to retrieve only images of desired classes.
- Indexing: When adding an image, you can optionally provide a list of labels.
- Searching: When querying, you can filter results to only include images with specific labels.
- OR Logic: If multiple labels are specified, images matching any of the labels are returned.
- No Filter: If no labels are provided during search, all images are considered.
python dataset/add_images_to_index.py --images-dir ./images --labels "Western Blot" "Microscopy"Or using a JSON mapping file:
python dataset/add_images_to_index.py --images-dir ./images --labels-file labels.jsonWhere labels.json maps image paths to their labels:
{
"image1.jpg": ["Western Blot"],
"image2.png": ["Microscopy", "Fluorescent"]
}This system is designed for multi-user environments (like ELIS).
- Isolation Strategy: Every vector is tagged with a
user_id. - Indexing: You must provide a
user_idwhen indexing. - Searching: Searches are mandatory filtered by
user_id. A user can only find matches within their own uploaded images. Cross-user search is disabled by design to ensure privacy.
Configuration is managed in config/config.yaml.
- Model: Change
deviceto"cuda"to enable GPU acceleration. - Milvus: Configure host, port, and index parameters (IVF_FLAT, HNSW, etc.).
- Source Code: Located in
src/. - Hot Reload: The
docker-compose.ymlmounts thesrc/directory, so changes to the code are reflected immediately (restart container to apply).