In this project, we will explore three distinct Swin Transformers, i.e., without augmentation, with augmentation, and without using the pre-trained weight (or from scratch). Here, the augmentation is undertaken with RandAugment, CutMix, and MixUp. We are about to witness the consequences of utilizing augmentation and pre-trained weight (transfer learning) on the models on the imbalanced dataset, i.e., Caltech-256. The dataset is split per category with a ratio of 81
:9
:10
for the training, validation, and testing sets. For the from scratch model, each category is truncated to 100
instances. Applying the augmentation and pre-trained weight clearly boosts the performance of the model. Not to mention the pre-trained weight insanely pushes the model to effectively predict the right label in the top-1 and top-5.
Check out this notebook to see and ponder the full implementation.
The result below shows the performance of three different Swin Transformer models: without augmentation, with augmentation, and from scratch, quantitatively.
Model | Loss | Top-1 Acc. | Top-5 Acc. |
---|---|---|---|
No Augmentation | 0.369 | 90.17% | 97.68% |
Augmentation | 0.347 | 91.57% | 98.75% |
From Scratch | 4.544 | 11.58% | 27.09% |
Accuracy curves of the models on the validation set.
Loss curves of the models on the validation set.
The following collated pictures visually delineate the quality of the prediction of the three models.
The prediction result of Swin Transformer without augmentation.
The prediction result of Swin Transformer with augmentation.
The prediction result of Swin Transformer from scratch (no pre-trained).
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- TorchVision's Swin Transformer
- Image classification with Swin Transformers
- Caltech-256 Object Category Dataset
- TorchVision's Caltech256 Dataset
- RandAugment: Practical automated data augmentation with a reduced search space
- RandAugment for Image Classification for Improved Robustness
- CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
- CutMix data augmentation for image classification
- mixup: Beyond Empirical Risk Minimization
- MixUp augmentation for image classification
- Multi-head or Single-head? An Empirical Comparison for Transformer Training
- Getting 95% Accuracy on the Caltech101 Dataset using Deep Learning
- How to use CutMix and MixUp
- PyTorch Lightning