AAAI'23 Workshop - A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer

An official code release of the paper A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer.

Abstract

Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Specifically, for point-interactive image colorization, previous research that uses convolutional layers is limited for colorizing partially an image, which produces inconsistent colors in an image. Thus, vision transformer has been used to alleviate this problem by using multi-head self attention to propagate user hints to distant relevant areas in the image. However, despite the success of vision transformers in colorizing the image and selectively colorizing the regions with user propagation hints, heavy underlying ViT architecture and the large number of required parameters hinder active real-time user interaction for colorization applications. Thus, in this work, we propose a novel efficient ViT architecture for real-time interactive colorization, A-ColViT that adaptively prunes the layers of vision transformer for every input sample. This method flexibly allocates computational resources of input samples, effectively achieving actual acceleration. In addition, we demonstrate through extensive experiments on ImageNet-ctest10k, Oxford 102flower, and CUB-200 datasets that our method outperforms the state-of-the-art approach and achieves actual acceleration.

Experiments

Installation

conda create -n colorization python=3.9 -y
conda activate colorization
pip install -r requirements.txt
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Preprocess

python preparation/make_mask.py --img_dir /home/data/imagenet/ctest10k/ --hint_dir ./data/ctest10k

Training

bash scripts/train_pruned.sh

Inference

bash scripts/infer_pruned.sh

Demo

coming soon

Acknowledgements

If you use this repo, please cite our paper:

BibTex:

@article{lee2023colvit,
  title={A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer},
  author={Lee, Gwanghan and Shin, Saebyeol and Ko, Donggeun and Jung, Jiyeon and Woo, Simon S},
  year={2023}
}

Plain text:

Lee, Gwanghan, et al. "A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer." (2023).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
evaluation		evaluation
fig		fig
preparation		preparation
scripts		scripts
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
dataset_folder.py		dataset_folder.py
datasets.py		datasets.py
engine.py		engine.py
hint_generator.py		hint_generator.py
infer.py		infer.py
infer_pruned.py		infer_pruned.py
losses.py		losses.py
modeling.py		modeling.py
optim_factory.py		optim_factory.py
pruned_model.py		pruned_model.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train_prune.py		train_prune.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AAAI'23 Workshop - A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer

Abstract

Experiments

Installation

Preprocess

Training

Inference

Demo

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

lee-gwang/A-ColViT

Folders and files

Latest commit

History

Repository files navigation

AAAI'23 Workshop - A-ColViT: Real-time Interactive Colorization by Adaptive Vision Transformer

Abstract

Experiments

Installation

Preprocess

Training

Inference

Demo

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages