PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation
Lukas Meyer*, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae
*This work was conducted during an internship at the National Institute
of Advanced Industrial Science and Technology.
| Webpage | Full Paper | Ramen Dataset (~50 GB) | PEGASET (~50 GB) |
We introduce Physical Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Preparation starts by separate scanning of both environments and objects. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene by interacting with their extracted mesh. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the CupNoodle dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.
This paper is one of the achievements of joint research with and is owned copyrighted material of ROBOT Industrial Basic Technology Collaborative Innovation Partnership. This research has been supported by the New Energy and Industrial Technology Development Organization (NEDO), under the project ID JPNP20016.
The repository contains submodules, thus please check it out with
git clone https://github.com/meyerls/PEGASUS.git --recursive # HTTPS
git submodule update --init --recursive
The coda has been tested with the following dependencies:
- Python 3.8
- Cuda 11.6
- PyTorch 1.12.1
Our default, provided install method is based on Conda package and is provided by the following script. This script has to be executed in the top layer of the repository. Currently, the setup script has only be tested on Ubuntu 20. An installation on windows should be possible but will not be provided in this repo.
./setup.sh
PEGASUS contains of three main components:
- GS Base Environment reconstruction
- GS Object Reconstruction
- PEGASUS Dataset Extraction
Click me
Will be updated soon
Click me
Will be updated soon! Not yet complete
For object reconstruction we provide two different processing weights. The first is scanning objects in the wild by taking videos from both sides of the object and the second one is using a camera rig to scan the object on a turntable. The first approach uses XMEM to create a segmentation mask of the selected object. For scanning one has to place only an aruco marker into the scene to obtain the correct scale. The turntable approach uses an arbitrary calibration object (I have used a texture rich paper with an aruco marker) to reuse its precomputed camera poses. A detailed workflow is provided in the following section.
The workflow for scanning objects in the wild is:
Currently it does not work for texture poor objects. Therefore the camera rig is more suitable. The reason is that computing the poses and also registering images from the bottom view does simply not work with COLMAP. Place the object onto a planer scene such as a table and make sure to move all around the object.
Print out an aruco marker and place it next to the object. For scaling the object measure and note down the size of the square aruco marker. A website to create aruco marker can be found here.
Record two videos with your phone camera or DSLR camera (We have used an iphone 12 in our example). The first video contains a hemispherical scan of the top view of the object. Try to cover a 360 degree view at 2-3 different height levels. For the second video this process must be repeated for the flipped object.
For extracting the semantic masks of the video we used XMEM.
XMEM can be started from the root directory of PEGASUS:
python submodules/XMem/interactive_demo.py --video[path to the video] --num_objects 1 --size -1
In the XMEM GUI select the object you want to extract (the object should be highlighted in red). Afterward click the button Forward Propagate (1) to extract the masks. Depending on the video length it takes around 1-2 min. To save the detected masks click on Export Overlays as Video (2) to save the binary masks as images. More info on how to use XMEM can be found here.
Note: please select the image size according to your GPU size or the quality you want to get. -1 uses the original image size. If you set a value it will resize the image according to its shorter side.
First both extracted images and masks have to be put into a common folder. This folder should be placed in a dataset folder where multiple reconstructed objects can be stored.
.
└── bouillon
├── down
│ ├── images
│ ├── masks
└── up
├── images
└── masks
To use the scanned object and included it in PEGASUS one has to define the object as a Dataset-Object in in_the_wild_dataset.py. The class (here Bouillon) name takes the name of the object.
class Bouillon(InTheWild):
OBJECT_NAME = 'bouillon'
ID = 201
TYPE = 'object'
RECORDING_TYPE = 'spherical' # 'spherical' or 'hemispherical'
ALPHA = 0.3
DATASET_TYPE = 'wild'
ARUCO_SIZE = 0.037 # in meter
def __init__(self, dataset_path):
super().__init__(dataset_path=Path(dataset_path))
OBJECT_NAME
: folder name of the object. By default it is the video name in the./workspace
folder (this folder gets generated by XMEM). Please rename to the object name.ID
: Unique object IDTYPE
: default type is object. Differs for environment (default: object)RECORDING_TYPE
: 'spherical' or 'hemispherical' depending on if you also scanned the bottom. This is recommend if you have texture-less objects. 'spherical': 2 videos (top & bottom). 'hemispherical': 1 video (top only)ALPHA
: Value for alpha shape reconstruction (default: 0.3)DATASET_TYPE
: name for your own dataset (default: wild)ARUCO_SIZE
: size of the aruco marker in meter(!)
python src/reconstruction/in_the_wild_object_reconstruction.py
- Tbd
We provide two different datasets. The IDs for the Ramen dataset are between 101 and 130. The YCB-V IDs are identical to the original YCB-V ids.
The Ramen Dataset contains out of 30 cup noodle objects and 9 environments.
.
└── Dataset
├── calibration
│ ├── ...
├── environment
│ ├── ...
├── object
│ ├── ...
└── urdf
└── ...
The PEGASET contains out of the well known 21 YCB-V and 9 environments.
.
└── Dataset
├── calibration
│ ├── ...
├── environment
│ ├── ...
├── object
│ ├── ...
└── urdf
└── ...
Before rendering a dataset the dataset provided for PEGASUS must have been downloaded from Ramen Dataset or PEGASET. If you use both dataset you should merge both into one folder structure.
All objects and environments which are relevant for dataset generation should be added to the obj_list
and env_list
.
Parameters:
mode: str
: Either"dynamic"
or"static"
rendering of scenenum_scenes: int
: Number of scenesnum_objects: int
: Maximum number of objects which are placed in the scene. A random number between 1 andnum_objects
is choosen.image_height
:image_width
:render_data_points: list
: Types of rendering and data points saved to output. e.g.['rgb', 'depth', 'seg_vis', 'seg_sil', 'sem_seg']
convert_from_scenewise2imagewise: bool
: By default the scene is saved per scene. If you need the data in sceneweise BOP-Format set toTrue
@Article{PEGASUS2024,
author = {Meyer, Lukas and Erich, Floris and Yoshiyasu, Yusuke and Stamminger, Marc and Ando, Noriaki and Domae, Yukiyasu },
title = {PEGASUS: Physical Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation},
journal = {IROS},
month = {October},
year = {2024},
url = {https://meyerls.github.io/pegasus_web}
}
Thanks to the authors of 3D Gaussians for their excellent code, please consider to also cite this repository:
@Article{kerbl3Dgaussians,
author = {Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
title = {3D Gaussian Splatting for Real-Time Radiance Field Rendering},
journal = {ACM Transactions on Graphics},
number = {4},
volume = {42},
month = {July},
year = {2023},
url = {https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/}
}
And thanks to authors of the BOP Toolkit for their benchmark for 6D object pose estimation interface.