Thanks for the great work. I meet two problems when conducting the experiment using ViT on VisDA-2017.
- It seems that the ViT backbone doesn't match with the bottleneck when setting no_pool. The output of ViT backbone is a sequence of tokens instead of a single class token. Thus, it makes the BatchNorm1d layer complains about the dimension.
- I fix the previous problem by adding a pool layer to extract the class token:
pool_layer = lambda _x: _x[:, 0] if args.no_pool else None
Then use the exact command in examples/run_visda.sh to run CDAN_MCC_SDAT:
python cdan_mcc_sdat.py data/visda-2017 -d VisDA2017 -s Synthetic -t Real -a vit_base_patch16_224 --epochs 15 --seed 0 --lr 0.002 --per-class-eval --train-resizing cen.crop --log logs/cdan_mcc_sdat_vit/VisDA2017 --log_name visda_cdan_mcc_sdat_vit --gpu 0 --no-pool --rho 0.02 --log_results
Finally I get a slightly lower accuracy as below:
global correct: 86.0
mean correct:88.3
mean IoU: 78.5
+------------+-------------------+--------------------+
| class | acc | iou |
+------------+-------------------+--------------------+
| aeroplane | 97.83323669433594 | 96.3012924194336 |
| bicycle | 88.43165588378906 | 81.25331115722656 |
| bus | 81.79104614257812 | 72.69281768798828 |
| car | 78.06941986083984 | 67.53160095214844 |
| horse | 97.31400299072266 | 92.78455352783203 |
| knife | 96.91566467285156 | 82.31681823730469 |
| motorcycle | 94.9102783203125 | 83.37374877929688 |
| person | 81.3499984741211 | 58.12790298461914 |
| plant | 94.04264831542969 | 89.68553161621094 |
| skateboard | 95.87899780273438 | 81.48286437988281 |
| train | 94.05099487304688 | 87.69535064697266 |
| truck | 59.04830551147461 | 48.311458587646484 |
+------------+-------------------+--------------------+
test_acc1 = 86.0
I notice that the epochs is 15 in the scripts. Is the experiment setting correct? How to get the reported accuracy? Many thank.
Thanks for the great work. I meet two problems when conducting the experiment using ViT on VisDA-2017.
pool_layer = lambda _x: _x[:, 0] if args.no_pool else NoneThen use the exact command in examples/run_visda.sh to run CDAN_MCC_SDAT:
python cdan_mcc_sdat.py data/visda-2017 -d VisDA2017 -s Synthetic -t Real -a vit_base_patch16_224 --epochs 15 --seed 0 --lr 0.002 --per-class-eval --train-resizing cen.crop --log logs/cdan_mcc_sdat_vit/VisDA2017 --log_name visda_cdan_mcc_sdat_vit --gpu 0 --no-pool --rho 0.02 --log_resultsFinally I get a slightly lower accuracy as below:
global correct: 86.0mean correct:88.3mean IoU: 78.5+------------+-------------------+--------------------+| class | acc | iou |+------------+-------------------+--------------------+| aeroplane | 97.83323669433594 | 96.3012924194336 || bicycle | 88.43165588378906 | 81.25331115722656 || bus | 81.79104614257812 | 72.69281768798828 || car | 78.06941986083984 | 67.53160095214844 || horse | 97.31400299072266 | 92.78455352783203 || knife | 96.91566467285156 | 82.31681823730469 || motorcycle | 94.9102783203125 | 83.37374877929688 || person | 81.3499984741211 | 58.12790298461914 || plant | 94.04264831542969 | 89.68553161621094 || skateboard | 95.87899780273438 | 81.48286437988281 || train | 94.05099487304688 | 87.69535064697266 || truck | 59.04830551147461 | 48.311458587646484 |+------------+-------------------+--------------------+test_acc1 = 86.0I notice that the epochs is 15 in the scripts. Is the experiment setting correct? How to get the reported accuracy? Many thank.