English | 中文
Haifan Gong1,2† · Xuanye Zhang1,2† · Ruifei Zhang1,2 · Yun Su3 · Zhuo Li1,2 · Yuhao Du1,2 · Anningzhe Gao2 · Xiang Wan1,2* · Haofeng Li4,2*
1 The Chinese University of Hong Kong, Shenzhen · 2 Shenzhen Research Institute of Big Data · 3 University of Waterloo · 4 Sun Yat-sen University
† Equal contribution · * Corresponding authors
| Resource | Link |
|---|---|
| Paper (NeurIPS) | Abstract · PDF |
| OpenReview | vE98S8BmzP |
| Dataset & models | huggingface.co/haifan-gong/IDAMA |
| Project page | docs/index.html |
| Code docs | code/README.md |
Patent-Product Image Retrieval (PPIR) retrieves patent images from product images to support infringement analysis. It is challenging because (1) both modalities contain diverse artificial objects, so standard pre-training generalizes poorly to unseen categories in an open-set setting, and (2) binary patent line drawings and colorful RGB product photos lie in very different visual domains.
We introduce IDAMA (Intermediate Domain Alignment and Morphology Analogy) and the benchmark PPIRD (Patent-Product Image Retrieval Dataset):
- Intermediate Domain Mapping (IDM) — map both patent and product images into a shared sketch / edge domain via edge detection to reduce cross-domain gap.
- Morphology Analogy Filter (MAF) — select discriminative patent images using high classification confidence (label-agnostic), inspired by analogical reasoning over visual morphology.
On PPIRD, IDAMA improves over strong baselines by +7.58 mAR and offers insights for open-set retrieval in PPIR.
Full figures, dataset protocol, and citation: see the project page (content aligned with idama-project).
| Split / component | Description | Scale |
|---|---|---|
| Test queries | Product–patent pairs with infringement labels and product metadata | 439 pairs |
| Retrieval gallery | Open-set patent pool at test time | 727,921 images |
| Unlabeled pre-training | Product & patent images (+ edge-domain variants for IDAMA) | 3,799,695 images |
Protocol: Given a product query, rank gallery patents; ground truth is annotated infringing patents. Gallery patents are not assumed seen during training.
Download PPIRD splits, checkpoints, and edge-detector weights from Hugging Face. Large files are not in this git repo—clone weights into data/ and model/ locally.
| Path | Description |
|---|---|
code/preprocessing/ |
Edge extraction (UAED), index building, tar packing |
code/feature_extraction/ |
Multi-backbone embeddings (EVA02, Swin, MAE, iBOT, …) |
code/inference/ |
Product–patent similarity & Top-K evaluation |
code/pretrain/ |
MAE / Swin / iBOT pretraining |
docs/ |
Project homepage (index.html) |
data/, model/ |
Local data & checkpoints (gitignored; from Hugging Face) |
Set the project root:
export IDAMA_ROOT=/path/to/IDAMA
cd "${IDAMA_ROOT}"1. Download dataset and model weights from haifan-gong/IDAMA into data/ and model/.
2. Intermediate-domain edges (if not using precomputed edge images):
bash code/preprocessing/edge_feature/run_uaed_edge_inference.sh \
/path/to/raw_images /path/to/edge_output 0.53. Feature extraction (example: EVA02):
CUDA_VISIBLE_DEVICES=0 bash code/feature_extraction/run_feature_extraction.sh eva024. Retrieval evaluation:
bash code/inference/run_inference.sh eva025. Optional — MAE pretraining on edge images:
cd code/pretrain
NPROC_PER_NODE=8 bash run_pretrain.sh mae-pretrain \
"${IDAMA_ROOT}/data/unlabeled_train/data_edge/goods_edge_0.5" \
"${IDAMA_ROOT}/output/pretrain_mae"Step-by-step module reference: code/README.md.
git clone https://github.com/haifangong/IDAMA.git
cd IDAMAIf you find this work useful, please cite:
@inproceedings{gong2025intermediate,
title = {Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval},
author = {Gong, Haifan and Zhang, Xuanye and Zhang, Ruifei and Su, Yun and Li, Zhuo and Du, Yuhao and Gao, Anningzhe and Wan, Xiang and Li, Haofeng},
booktitle = {Advances in Neural Information Processing Systems},
volume = {38},
year = {2025},
url = {https://papers.nips.cc/paper_files/paper/2025/hash/154743e7e9688cf77db5ee75807bda82-Abstract-Conference.html}
}Third-party code (UAED, MAE, Swin, etc.) follows upstream licenses. Dataset and model terms are described on Hugging Face.