Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval

English | 中文

Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval

Haifan Gong^1,2† · Xuanye Zhang^1,2† · Ruifei Zhang^1,2 · Yun Su³ · Zhuo Li^1,2 · Yuhao Du^1,2 · Anningzhe Gao² · Xiang Wan^1,2* · Haofeng Li^4,2*

¹ The Chinese University of Hong Kong, Shenzhen · ² Shenzhen Research Institute of Big Data · ³ University of Waterloo · ⁴ Sun Yat-sen University

^† Equal contribution · ^* Corresponding authors

Resource	Link
Paper (NeurIPS)	Abstract · PDF
OpenReview	vE98S8BmzP
Dataset & models	huggingface.co/haifan-gong/IDAMA
Project page	docs/index.html
Code docs	code/README.md

Overview

Patent-Product Image Retrieval (PPIR) retrieves patent images from product images to support infringement analysis. It is challenging because (1) both modalities contain diverse artificial objects, so standard pre-training generalizes poorly to unseen categories in an open-set setting, and (2) binary patent line drawings and colorful RGB product photos lie in very different visual domains.

We introduce IDAMA (Intermediate Domain Alignment and Morphology Analogy) and the benchmark PPIRD (Patent-Product Image Retrieval Dataset):

Intermediate Domain Mapping (IDM) — map both patent and product images into a shared sketch / edge domain via edge detection to reduce cross-domain gap.
Morphology Analogy Filter (MAF) — select discriminative patent images using high classification confidence (label-agnostic), inspired by analogical reasoning over visual morphology.

On PPIRD, IDAMA improves over strong baselines by +7.58 mAR and offers insights for open-set retrieval in PPIR.

Full figures, dataset protocol, and citation: see the project page (content aligned with idama-project).

PPIRD at a glance

Split / component	Description	Scale
Test queries	Product–patent pairs with infringement labels and product metadata	439 pairs
Retrieval gallery	Open-set patent pool at test time	727,921 images
Unlabeled pre-training	Product & patent images (+ edge-domain variants for IDAMA)	3,799,695 images

Protocol: Given a product query, rank gallery patents; ground truth is annotated infringing patents. Gallery patents are not assumed seen during training.

Download PPIRD splits, checkpoints, and edge-detector weights from Hugging Face. Large files are not in this git repo—clone weights into data/ and model/ locally.

Repository layout

Path	Description
`code/preprocessing/`	Edge extraction (UAED), index building, tar packing
`code/feature_extraction/`	Multi-backbone embeddings (EVA02, Swin, MAE, iBOT, …)
`code/inference/`	Product–patent similarity & Top-K evaluation
`code/pretrain/`	MAE / Swin / iBOT pretraining
`docs/`	Project homepage (`index.html`)
`data/`, `model/`	Local data & checkpoints (gitignored; from Hugging Face)

Set the project root:

export IDAMA_ROOT=/path/to/IDAMA
cd "${IDAMA_ROOT}"

Quick start (reproduction)

1. Download dataset and model weights from haifan-gong/IDAMA into data/ and model/.

2. Intermediate-domain edges (if not using precomputed edge images):

bash code/preprocessing/edge_feature/run_uaed_edge_inference.sh \
  /path/to/raw_images /path/to/edge_output 0.5

3. Feature extraction (example: EVA02):

CUDA_VISIBLE_DEVICES=0 bash code/feature_extraction/run_feature_extraction.sh eva02

4. Retrieval evaluation:

bash code/inference/run_inference.sh eva02

5. Optional — MAE pretraining on edge images:

cd code/pretrain
NPROC_PER_NODE=8 bash run_pretrain.sh mae-pretrain \
  "${IDAMA_ROOT}/data/unlabeled_train/data_edge/goods_edge_0.5" \
  "${IDAMA_ROOT}/output/pretrain_mae"

Step-by-step module reference: code/README.md.

Clone

git clone https://github.com/haifangong/IDAMA.git
cd IDAMA

Citation

If you find this work useful, please cite:

@inproceedings{gong2025intermediate,
  title     = {Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval},
  author    = {Gong, Haifan and Zhang, Xuanye and Zhang, Ruifei and Su, Yun and Li, Zhuo and Du, Yuhao and Gao, Anningzhe and Wan, Xiang and Li, Haofeng},
  booktitle = {Advances in Neural Information Processing Systems},
  volume    = {38},
  year      = {2025},
  url       = {https://papers.nips.cc/paper_files/paper/2025/hash/154743e7e9688cf77db5ee75807bda82-Abstract-Conference.html}
}

License

Third-party code (UAED, MAE, Swin, etc.) follows upstream licenses. Dataset and model terms are described on Hugging Face.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
docs		docs
.gitignore		.gitignore
README.md		README.md
README.zh-CN.md		README.zh-CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval

Overview

PPIRD at a glance

Repository layout

Quick start (reproduction)

Clone

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval

Overview

PPIRD at a glance

Repository layout

Quick start (reproduction)

Clone

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages