Skip to content

lee-gwang/A-ColViT

Repository files navigation

AAAI'23 Workshop - A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer

An official code release of the paper A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer.

Abstract

Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Specifically, for point-interactive image colorization, previous research that uses convolutional layers is limited for colorizing partially an image, which produces inconsistent colors in an image. Thus, vision transformer has been used to alleviate this problem by using multi-head self attention to propagate user hints to distant relevant areas in the image. However, despite the success of vision transformers in colorizing the image and selectively colorizing the regions with user propagation hints, heavy underlying ViT architecture and the large number of required parameters hinder active real-time user interaction for colorization applications. Thus, in this work, we propose a novel efficient ViT architecture for real-time interactive colorization, A-ColViT that adaptively prunes the layers of vision transformer for every input sample. This method flexibly allocates computational resources of input samples, effectively achieving actual acceleration. In addition, we demonstrate through extensive experiments on ImageNet-ctest10k, Oxford 102flower, and CUB-200 datasets that our method outperforms the state-of-the-art approach and achieves actual acceleration.

Experiments

Installation

conda create -n colorization python=3.9 -y
conda activate colorization
pip install -r requirements.txt
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Preprocess

python preparation/make_mask.py --img_dir /home/data/imagenet/ctest10k/ --hint_dir ./data/ctest10k

Training

bash scripts/train_pruned.sh

Inference

bash scripts/infer_pruned.sh

Demo

coming soon

Acknowledgements


If you use this repo, please cite our paper:

BibTex:

@article{lee2023colvit,
  title={A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer},
  author={Lee, Gwanghan and Shin, Saebyeol and Ko, Donggeun and Jung, Jiyeon and Woo, Simon S},
  year={2023}
}

Plain text:

Lee, Gwanghan, et al. "A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer." (2023).

About

AAAI'23 Workshop, A-ColViT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published