Different behaviours on VisDA-2017 using different pretrained models from timm #5

swift-n-brutal · 2022-09-15T03:39:25Z

Thanks for the great work. I meet two problems when conducting the experiment using ViT on VisDA-2017.

It seems that the ViT backbone doesn't match with the bottleneck when setting no_pool. The output of ViT backbone is a sequence of tokens instead of a single class token. Thus, it makes the BatchNorm1d layer complains about the dimension.
I fix the previous problem by adding a pool layer to extract the class token:
pool_layer = lambda _x: _x[:, 0] if args.no_pool else None
Then use the exact command in examples/run_visda.sh to run CDAN_MCC_SDAT:
python cdan_mcc_sdat.py data/visda-2017 -d VisDA2017 -s Synthetic -t Real -a vit_base_patch16_224 --epochs 15 --seed 0 --lr 0.002 --per-class-eval --train-resizing cen.crop --log logs/cdan_mcc_sdat_vit/VisDA2017 --log_name visda_cdan_mcc_sdat_vit --gpu 0 --no-pool --rho 0.02 --log_results
Finally I get a slightly lower accuracy as below:
global correct: 86.0
mean correct:88.3
mean IoU: 78.5
+------------+-------------------+--------------------+
| class | acc | iou |
+------------+-------------------+--------------------+
| aeroplane | 97.83323669433594 | 96.3012924194336 |
| bicycle | 88.43165588378906 | 81.25331115722656 |
| bus | 81.79104614257812 | 72.69281768798828 |
| car | 78.06941986083984 | 67.53160095214844 |
| horse | 97.31400299072266 | 92.78455352783203 |
| knife | 96.91566467285156 | 82.31681823730469 |
| motorcycle | 94.9102783203125 | 83.37374877929688 |
| person | 81.3499984741211 | 58.12790298461914 |
| plant | 94.04264831542969 | 89.68553161621094 |
| skateboard | 95.87899780273438 | 81.48286437988281 |
| train | 94.05099487304688 | 87.69535064697266 |
| truck | 59.04830551147461 | 48.311458587646484 |
+------------+-------------------+--------------------+
test_acc1 = 86.0
I notice that the epochs is 15 in the scripts. Is the experiment setting correct? How to get the reported accuracy? Many thank.

The text was updated successfully, but these errors were encountered:

swift-n-brutal · 2022-09-16T07:48:06Z

After some research, the first problem has been resolved. The default behaviour of ViT.forward() changed in different version of timm. When global_pool='', the backbone returns x[:, 0] in timm=v0.5.x, while it returns x in timm=v0.6.7.

rangwani-harsh · 2022-09-24T05:43:31Z

Hi @swift-n-brutal are you now able to get the correct accuracy for VisDA dataset?

swift-n-brutal · 2022-09-27T07:28:20Z

I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one.

Wangzs0228 · 2023-10-18T13:34:16Z

I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one.

Have you found the reason? Is it a flaw in the model or a problem with our operation?

swift-n-brutal · 2023-10-20T06:46:15Z

@Wangzs0228 It is almost sure that the smoothness regularization is beneficial to transferability, robustness, generalization ability, etc. For a specific task, the results may vary. I am not working on this task recently. You can try the experiments to see if the results match your expectations.

swift-n-brutal changed the title ~~ViT on VisDA-2017~~ Different behaviours on VisDA-2017 using different pretrained models from timm Oct 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different behaviours on VisDA-2017 using different pretrained models from timm #5

Different behaviours on VisDA-2017 using different pretrained models from timm #5

swift-n-brutal commented Sep 15, 2022

swift-n-brutal commented Sep 16, 2022

rangwani-harsh commented Sep 24, 2022

swift-n-brutal commented Sep 27, 2022

Wangzs0228 commented Oct 18, 2023

swift-n-brutal commented Oct 20, 2023

Different behaviours on VisDA-2017 using different pretrained models from timm #5

Different behaviours on VisDA-2017 using different pretrained models from timm #5

Comments

swift-n-brutal commented Sep 15, 2022

swift-n-brutal commented Sep 16, 2022

rangwani-harsh commented Sep 24, 2022

swift-n-brutal commented Sep 27, 2022

Wangzs0228 commented Oct 18, 2023

swift-n-brutal commented Oct 20, 2023