Official PyTorch implementation of paper: Transformer based deep learning for digital image correlation
A DIC network developed based on GMFlow for high accuracy measurement of deformation.
Different from previous models that directly establish the relationship between grayscale value changes and the displacements, DICTr reformulates the problem back to the image registration driven by feature matching, which has clearer physical meaning.
System: Ubuntu 22.04.2 LTS
Datasets generation:
- MATLAB ≥ R2020b
DICTr network:
- Conda ≥ 22.9.0
- PyTorch ≥ 1.13.1
- CUDA ≥ 11.6
- Python ≥ 3.8.11
We recommend creating a Conda environment through the YAML file provided in the repository:
conda env create -f environment.yaml
conda activate dictrWhen generating datasets and training on remote server, we recommend using tmux to prevent accidental session interruptions.
The dataset required for DICTr training can be generated through the MATLAB script provided in the repository:
cd ./dataset/DICTrDatasetGenerator
matlab -nodisplay -nosplash
>> mainExecute the following command in the root directory of the repository:
sh ./scripts/train.shDetailed explanation of parameters in the train script:
# name of dataset used for training
# you can create your own dataset in the dataset.py file
--stage speckle
# number of image pairs used to update model parameters during each train
# the upper limit depends on your VRAM size
--batch_size 12
# name of dataset used for validation
# you can create your own dataset in the dataset.py and evaluate.py file
--val_dataset speckle
# learning rate
--lr 2e-4
# DICTr use 12 transformer layers (6 blocks) to enhance image features
--num_transformer_layers 12
# DICTr get full resolution result by convex upsampling from 1/2 resolution
--upsample_factor 2
# DICTr use 2 scale features, 1/4 for global match and 1/2 for refinement
--num_scales 2
# number of splits on feature map edge to form window layout for swin transformer
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--attn_splits_list 2 8
# radius for feature matching, -1 indicates global matching
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--corr_radius_list -1 4
# fequency to perform validation
--val_freq 5000
# fequency to save model
--save_ckpt_freq 5000
# total train step for automatic stopping during UNATTENDED TRAINING
--num_steps 100000Due to differences in VRAM across GPU devices, you may need to adjust both batch_size and num_steps to complete the training.
We employ the Early Stopping regularization approach to determine whether to stop updating the model. Specifically, the network is trained on the training set, and the validation set is periodically evaluated for a decrease in AEE. In order to prevent overfitting, training should halted once the validation performance no longer improves. The final model is then applied to running inference on the test set. This approach means you do not need to complete all training steps (num_steps).
The training, validation, and test sets should not overlap to prevent data leakage. For further details, please refer to Wikipedia.
For reference, DICTr is trained on a system equipped with an AMD Ryzen 7 5700X@ 3.40GHz CPU, 128 GB RAM, and dual NVIDIA GeForce RTX 3090 Ti GPUs (each with 24GB VRAM). The default batch size is 12 and it took 8 hours.
Execute the following command in the root directory of the repository to run inference:
sh ./scripts/experiment.shDetailed explanation of parameters in the experiment script:
# path to resume model
# you can replace with newly trained result
--resume checkpoints/step_080000.pth
# name of experiment for running inference
# you can create custom test in experiment.py file
--exp_type rotation tension star5 mei realcrack
# DICTr use 12 transformer layers (6 blocks) to enhance image features
--num_transformer_layers 12
# DICTr get full resolution result by convex upsampling from 1/2 resolution
--upsample_factor 2
# DICTr use 2 scale features, 1/4 for global match and 1/2 for refinement
--num_scales 2
# number of splits on feature map edge to form window layout for swin transformer
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--attn_splits_list 2 8
# radius for feature matching, -1 indicates global matching
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--corr_radius_list -1 4The results will be saved in the ./test folder in .csv format, which store the full-field displacement information of
By default, all tests in the paper will performed.
The REF and TAR images can be found in ./test folder.
You can add custom test in the ./experiment.py file.
The pretrained models of DICTr used in the paper is ./checkpoints/step_080000.pth provided in the repository. It will be loaded in the default experiment script.
@article{ZHOU2025108568,
title = {Transformer based deep learning for digital image correlation},
journal = {Optics and Lasers in Engineering},
volume = {184},
pages = {108568},
year = {2025},
issn = {0143-8166},
doi = {https://doi.org/10.1016/j.optlaseng.2024.108568},
url = {https://www.sciencedirect.com/science/article/pii/S0143816624005463},
author = {Yifei Zhou and Qianjiang Zuo and Nan Chen and Licheng Zhou and Bao Yang and Zejia Liu and Yiping Liu and Liqun Tang and Shoubin Dong and Zhenyu Jiang}
}This project owes its existence to the indispensable contribution of GMFlow.

