- Stanford Bunny: A Volumetric Method for Building Complex Models from Range Images [SIGGRAPH 1996]
- KITTI: Are we ready for autonomous driving? the KITTI vision benchmark suite [CVPR 2012]
- NYUV2: Indoor Segmentation and Support Inference from RGBD Images [ECCV 2012]
- FAUST: FAUST: Dataset and evaluation for 3D mesh registration [CVPR 2014]
- ICL-NUIM: A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM [ICRA 2014]
- Augmented ICL-NUIM: Robust Reconstruction of Indoor Scenes [CVPR 2015]
- ModelNet: 3d shapenets: A deep representation for volumetric shapes [
cls
; CVPR 2015] - SUN RGB-D: Sun rgb-d: A rgb-d scene understanding benchmark suite [
det
; CVPR 2015] - SHREC15: SHREC’15 Track: Non-rigid 3D Shape Retrieval [Eurographics 2015]
- ShapeNetCore: ShapeNet: An Information-Rich 3D Model Repository [
cls
; arXiv 2015] - ShapeNet Part: A Scalable Active Framework for Region Annotation in 3D Shape Collections [
seg
; SIGGRAPH Asia 2016] - SceneNN: SceneNN: A Scene Meshes Dataset with aNNotations [3DV 2016]
- Oxford RobotCar: 1 Year, 1000km: The Oxford RobotCar Dataset [IJRR 2016]
- Redwood: A large dataset of object scans [arXiv 2016]
- S3DIS: 3D Semantic Parsing of Large-Scale Indoor Spaces [CVPR 2016], Joint 2D-3D-Semantic Data for Indoor Scene Understanding [
seg
; arXiv 2017] - 3DMatch: 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions [CVPR 2017]
- SUNCG: Semantic Scene Completion from a Single Depth Image [CVPR 2017]
- ScanNet: Scannet: Richly-annotated 3d reconstructions of indoor scenes [
seg
,det
; CVPR 2017 ] - Semantic3D: Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark [arXiv 2017]
- SemanticKITTI: SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [
seg
; ICCV 2019] - ScanObjectNN: Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data [ICCV 2019]
- PartNet: PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding [CVPR 2019]
- Completion3D: TopNet: Structural Point Cloud Decoder [
completion
; CVPR 2019] - Argoverses: Argoverse: 3D Tracking and Forecasting with Rich Maps [CVPR 2019]
- Waymo Open Dataset: Scalability in Perception for Autonomous Driving: Waymo Open Dataset [CVPR 2020]
- nuScenes: nuScenes: A multimodal dataset for autonomous driving [
det
,tracking
; CVPR 2020] - SensatUrban: Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges [CVPR 2021], SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point Clouds [IJCV 2022]
- WAYMO OPEN MOTION DATASET: Large Scale Interactive Motion Forecasting for Autonomous Driving : The WAYMO OPEN MOTION DATASET [arXiv 2104]
- Panoptic nuScenes: Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking [arXiv 2109]
- BuildingNet: BuildingNet: Learning to Label 3D Buildings [ICCV 2021 Oral]
- ARKitScenes: ARKitScenes - A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data [NeurIPS 2021]
- CODA: CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving [arXiv 2203]
- STPLS3D: STPLS3D: A Large-Scale Synthetic and Real Aerial Photogrammetry 3D Point Cloud Dataset [arXiv 2203]
- TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes [arXiv 2203]
- Omni3D: Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild [arXiv 2207]
- Rope3D: Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task [CVPR 2022]
- DAIR-V2X: DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection [CVPR 2022]
- ONCE-3DLanes: ONCE-3DLanes: Building Monocular 3D Lane Detection [CVPR 2022]
- Ithaca365: Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions [CVPR 2022]
- OpenLane: PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark [ECCV 2022]
- HM3DSEM: Habitat-Matterport 3D Semantics Dataset [arXiv 2210]
- Objaverse: Objaverse: A Universe of Annotated 3D Objects [arXiv 2212]
- OmniObject3D: OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation [arXiv 2301]
- OpenOccupancy: OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception [arXiv 2303]
- V2V4Real: V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception [CVPR 2023]
- CVPR2023-Occupancy-Prediction-Challenge: https://github.com/CVPR2023-3D-Occupancy-Prediction/CVPR2023-3D-Occupancy-Prediction [CVPR2023 Challenge]
- WOMD-LiDAR: WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting [arXiv 2304]
- Occ3D: Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving [arXiv 2304]
- SSCBench: SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving [arXiv 2306]
- UniG3D: UniG3D: A Unified 3D Object Generation Dataset [arXiv 2306]
- WaterScenes: WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces [arXiv 2307]
普林斯顿ModelNet项目的目标是为计算机视觉、计算机图形学、机器人和认知科学领域的研究者们提供一个全面、干净的三维CAD模型集合, 该数据的主页地址https://modelnet.cs.princeton.edu, 数据最早发布在论文3D ShapeNets: A Deep Representation for Volumetric Shapes [CVPR 2015]上.
相关工作人员从数据中选择了常见的40类和10类构成数组子集, 分别表示为ModelNet40和ModelNet10, 且两个数据集都有orientation aligned的版本。实验中数据用到比较多的是ModelNet40, 有如下三种数据形式:
数据集 | modelnet40_normal_resampled.zip | modelnet40_ply_hdf5_2048.zip | ModelNet40.zip |
---|---|---|---|
文件大小 | 1.71G | 435M | 2.04G |
内容 | point: x, y, z, normal_x, normal_y, normal_z; shape: 10k points |
point: x, y, z; normal_x, normal_y, normal_z; shape: 2048 points |
off格式, 具体参考这里 |
训练集 / 测试集 | 9843 / 2468 | 9840 / 2468 | 9844 / 2468 |
下载地址 | modelnet40_normal_resampled.zip | modelnet40_ply_hdf5_2048.zip | ModelNet40.zip |
ShapeNet数据集是一个有丰富标注的、大规模的3D图像数据集, 发布于ShapeNet: An Information-Rich 3D Model Repository [arXiv 2015], 它是普林斯顿大学、斯坦福大学和TTIC研究人员共同努力的结果, 官方主页为shapenet.org.ShapeNet包括ShapeNetCore和ShapeNetSem子数据集.
ShapeNet Part是从ShapeNetCore数据集选择了16类并进行语义信息标注的数据集, 用于点云的语义分割任务, 其数据集发表于A Scalable Active Framework for Region Annotation in 3D Shape Collections [SIGGRAPH Asia 2016], 官方主页为 ShapeNet Part. 数据包含几个不同的版本, 其下载链接分别为shapenetcore_partanno_v0.zip (1.08G)和shapenetcore_partanno_segmentation_benchmark_v0.zip(635M). 下面就第2个数据集segmentation benchmark进行介绍:
从下面表格可以看出, ShapeNet Part总共有16类, 50个parts,总共包括16846个样本。该数据集中样本呈现出不均衡特性,比如Table包括5263个, 而Earphone只有69个。每个样本包含2000多个点, 属于小数据集。该数据集中训练集12137个, 验证集1870个, 测试集2874个, 总计16881个。[注意, 这里和下面表格统计的(16846)并不一样, 后来发现是训练集、验证集和测试集有35个重复的样本]
类别 | nparts/shape | nsamples | 平均npoints/shape |
---|---|---|---|
Airplane | 4 | 2690 | 2577 |
Bag | 2 | 76 | 2749 |
Cap | 2 | 55 | 2631 |
Car | 4 | 898 | 2763 |
Chair | 4 | 3746 | 2705 |
Earphone | 3 | 69 | 2496 |
Guitar | 3 | 787 | 2353 |
Knife | 2 | 392 | 2156 |
Lamp | 4 | 1546 | 2198 |
Laptop | 2 | 445 | 2757 |
Motorbike | 6 | 202 | 2735 |
Mug | 2 | 184 | 2816 |
Pistol | 3 | 275 | 2654 |
Rocket | 3 | 66 | 2358 |
Skateboard | 3 | 152 | 2529 |
Table | 3 | 5263 | 2722 |
Total | 50 | 16846 | 2616 |
S3DIS是3D室内场景的数据集, 主要用于点云的语义分割任务。主页http://buildingparser.stanford.edu/dataset.html. (但官方主页我暂时访问不了了, 关于数据集背景的介绍性说明就不写了). 关于S3DIS的论文是Joint 2D-3D-Semantic Data for Indoor Scene Understanding [arXiv 2017]和3D Semantic Parsing of Large-Scale Indoor Spaces [CVPR 2016]. S3DIS从3个building的6个Area采集得到, Area1, Area3, Area6属于buidling 1, Area2和Area4属于building 2, Area5属于building 3. 常用的数据下载格式包括如下三种:
- Stanford3dDataset_v1.2_Aligned_Version.zip, 比如: RandLA-Net
- Stanford3dDataset_v1.2.zip, 比如: CloserLook3D
- indoor3d_sem_seg_hdf5_data.zip, 比如: PointNet
其中Stanford3dDataset_v1.2_Aligned_Version.zip和Stanford3dDataset_v1.2.zip都是完整场景的数据集, 每个点对应6个维度(x, y, z, r, g, b), 而indoor3d_sem_seg_hdf5_data.zip是对原始数据场景的切割,把大场景切割成1m x 1m的block: 完整数据集被切割成了23585个block, 每个block是4096个点, 每个点对应9个维度: 除了x, y, z, r, g, b信息外,剩余的3维是相对于所在大场景的位置(归一化坐标).
下面是由Stanford3dDataset_v1.2.zip数据统计得到的关于S3DIS的信息, 可能和论文中一些结果不太一致。S3DIS数据集由以上6个Area采集得到, 共包含272个场景, 可分为11种不同的场景(括号内为场景数量, 场景大小(点的数量)): office(156, 87w), conference room(11, 142w), hallway(61, 122w), auditorium(2, 817w), open space(1, 197w), lobby(3, 242w), lounge(3, 146w), pantry(3, 58w), copy room(2, 52w), storage(19, 35w) and WC(11, 70w). 根据语义信息, 上述场景被分成14个类别, 如下表所示. 可以看到不同的类别也是不均衡的, 比如wall有1547个, 但sofa只有55个.
Total | column | clutter | chair | window | beam | floor | wall | ceiling | door | bookcase | board | table | sofa | stairs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9833 | 254 | 3882 | 1363 | 168 | 159 | 284 | 1547 | 385 | 543 | 584 | 137 | 455 | 55 | 17 |
详细信息请查看3DMatch文件夹。