- 2024.07.02 ScaleDreamer is accepted by ECCV 2024
- 2024.06.23 Create this repo.
Follow threestudio to set up the conda environment, or use our provided instructions as below.
- Create a virtual environment:
conda create -n scaledreamer python=3.10
conda activate scaledreamer
- Install PyTorch
# Prefer using the latest version of CUDA and PyTorch
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
- (Optional, Recommended) Install xFormers for attention acceleration.
conda install xformers -c xformers
- (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
pip install ninja
- Install major dependencies:
pip install -r requirements.txt
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install git+https://github.com/KAIR-BAIR/[email protected]
If you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these instructions to change the gcc version within your conda environment. Then return to the repository directory to install iNGP and NerfAcc ⬆️ again.
conda install -c conda-forge gxx=9.5.0
cd $CONDA_PREFIX/lib
ln -s /usr/lib/x86_64-linux-gnu/libcuda.so ./
cd <your repo directory>
Download 2D Diffusion Priors.
- Save SD-v2.1-base and MVDream to the local directory
pretrained
.
python scripts/download_pretrained_models.py
- ASD with
SD
(Stable Diffusion). Feel free to change the prompt accordingly.
sh scripts/single-prompt-benchmark/asd_sd_nerf.sh
- ASD with
MV
(MVDream). Feel free to change the prompt accordingly.
sh scripts/single-prompt-benchmark/asd_mv_nerf.sh
The following 3D generator
architectures are available:
Network | Description | File |
---|---|---|
Hyper-iNGP | iNGP with text-conditioned linear layers,adopted from ATT3D | geometry, background |
3DConv-net | A StyleGAN generator that outputs voxels with 3D convolution, code adopted from CC3D | geometry, architecture |
Triplane-Transformer | Transformer-based 3D Generator, with Triplane as the output structure, adopted from LRM | geometry, architecture |
The following corpus
datasets are available:
Corpus | Description | File |
---|---|---|
MG15 | 15 text pormpts from Magic3D project page | json |
DF415 | 415 text pormpts from DreamFusion project page | json |
AT2520 | 2520 text pormpts from ATT3D experiments | json |
DL17k | 17k text pormpts from Instant3D release | json |
CP100k | 110k text pormpts from Cap3D dataset | json |
Run the following script to start training
Hyper-iNGP
withSD
onMG15
sh scripts/multi-prompt-benchmark/asd_sd_hyper_iNGP_MG15.sh
3DConv-net
withSD
onDF415
sh scripts/multi-prompt-benchmark/asd_sd_3dconv_net_DF415.sh
3DConv-net
withSD
onAT2520
sh scripts/multi-prompt-benchmark/asd_sd_3dconv_net_AT2520.sh
Triplane-Transformer
withMV
onDL17k
sh scripts/multi-prompt-benchmark/asd_mv_triplane_transformer_DL17k.sh
3DConv-net
withSD
onCP100k
scripts/multi-prompt-benchmark/asd_sd_3dconv_net_CP100k.sh
Create a directory to save the checkpoints
mkdir pretrained/3d_checkpoints
The checkpoints of the ⬆️ experiments are available. Save the corresponding .pth
file to 3d_checkpoint
, then run the scripts as below.
Hyper-iNGP
withSD
onMG15
. The ckpt in Google Drive
sh scripts/multi_prompts_benchmark_evaluation/asd_sd_hyper_iNGP_MG15.sh
3DConv-net
withSD
onDF415
. The ckpt in Google Drive
sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_DF415.sh
3DConv-net
withSD
onAT2520
. The ckpt in Google Drive
sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_AT2520.sh
Triplane-Transformer
withMV
onDL17k
. The ckpt in Google Drive
sh scripts/multi_prompts_benchmark_evaluation/asd_mv_triplane_transformer_DL17k.sh
3DConv-net
withSD
onCP100k
. The ckpt in Google Drive
sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_CP100k.sh
The rendered images and videos are saved in outputs/<experiment_name>/save/<num_iter>
directory. Compute the metrics with CLIP via
python evaluation/CLIP/evaluation_amortized.py --result_dir <video_dir>
- Place the code in
custom/amortized/models/geometry
, check out the other code in that directory for reference. - Update your <name_of_file> in
custom/amortized/models/geometry/__init__.py
- Create your own config file, type in your registered module name in the
system.geometry_type
argument, check out the other code in theconfigs/multi-prompt_benchmark
directory for reference.
- Put your code in
threestudio/models/guidance
, take a look at the other code in that directory or other guidance for reference. - Update your <name_of_file> in
threestudio/models/guidance/__init__.py
- Create your own config file, type in your registered module name in the
system.guidance_type
argument, take a look at the other code in theconfigs/multi-prompt_benchmark
directory for reference.
- Create a JSON file that lists the training, validation, and test text prompts in the
load
directory - Enter the name of this JSON file into the
system.prompt_processor.prompt_library
argument to set up the corpus, take other commands in thescripts
directory for reference
You can also add your modules for data
, renderer
, prompt_processor
, etc.
If you find this paper helpful, please cite
@article{ma2024scaledreamer,
title={ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation},
author={Ma, Zhiyuan and Wei, Yuxiang and Zhang, Yabin and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
journal={arXiv preprint arXiv:2407.02040},
year={2024}
}
- threestudio, a clean and extensible codebase for text-to-3D.
- MVDream-threestudio, the implementation of MVDream for text-to-3D.
- OpenLRM, the implementation of LRM. We develop the 3D generator of Triplane-Transformer on top of it.
- Cap3D, which provides the text caption of Objaverse. We develop the corpus of CP100k on top of it.