Detection 3D
Back to index Back to Detection
Reference | Sensors | Object Type | Sensing Modality Representations and Processing | Network Pipeline | How to generate Region Proposals (RP) | When to fuse | Fusion Operation and Method | Fusion Level | Dataset(s) used |
---|---|---|---|---|---|---|---|---|---|
Meyer and Kuschk, 2019 [pdf][ref] | Radar, visual camera | 3D Vehicle | Radar pointcloud, RGB image. Fused features extracted from CNN. | Faster R-CNN | Before and after RP | Average mean | Region proposal | Early, Middle | Astyx HiRes2019 |
Liang et al., 2019 [pdf][ref] | LiDAR, visual camera | 3D Car, Pedestrian, Cyclist | LiDAR BEV maps, RGB image. Each processed by a ResNet with auxiliary tasks: depth estimation and ground segmentation | Faster R-CNN | Predictions with fused features | Before RP | Addition, continuous fusion layer | Middle | KITTI, self-recorded |
Wang et al., 2019 [pdf][ref] | LiDAR, visual camera | 3D Car, Pedestrian, Cyclist, Indoor objects | LiDAR voxelized frustum (each frustum processed by the PointNet), RGB image (using a pre-trained detector). | R-CNN | Pre-trained RGB image detector | After RP | Using RP from RGB image detector to build LiDAR frustums | Late | KITTI, SUN-RGBD |
Dou et al., 2019 [pdf][ref] | LiDAR, visual camera | 3D Car | LiDAR voxel (processed by VoxelNet), RGB image (processed by a FCN to get semantic features) | Two stage detector | Predictions with fused features | Before RP | Feature concatenation | Middle | KITTI |
Sindagi et al., 2019 [pdf][ref] | LiDAR, visual camera | 3D Car | LiDAR voxel (processed by VoxelNet), RGB image (processed by a pre-trained 2D image detector). | One stage detector | Predictions with fused features | Before RP | Feature concatenation | Early, Middle | KITTI |
Liang et al., 2018 [pdf][ref] | LiDAR, vision camera | 3D Car, Pedestrian, Cyclist | LiDAR BEV maps, RGB image. Each processed by ResNet | One stage detector | Predictions with fused features. | Before RP | Addition, continuous fusion layer | Middle | KITTI, self-recorded |
Du et al., 2018 [pdf][ref] | LiDAR, vision camera | 3D Car | LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) | R-CNN | Pre-trained RGB image detector produces 2D bounding boxes to crop LiDAR points, which are then clustered | Before and at RP | Ensemble: use RGB image detector to regress car dimensions for a model fitting algorithm. | Late | KITTI, self-recorded data |
Yang et al., 2018 [pdf][ref] | LiDAR, HD-map | 3D Car | LiDAR BEV maps, Road mask image from HD map. Inputs processed by PIXOR++ [ref] with the backbone similar to FPN | One stage detector | Detector predictions | Before RP | Feature concatenation | Early | KITTI, TOR4D Dataset~[ref] |
Casas et al., 2018 [pdf][ref] | LiDAR, HD-map | 3D Car | sequential LiDAR BEV maps, sequential several road topology mask images from HD map. Each input processed by a base network with residual blocks | One stage detector | Detector predictions | Before RP | Feature concatenation | Middle | self-recorded data |
Shin et al., 2018 [pdf][ref] | LiDAR, vision camera | 3D Car | LiDAR point clouds, (processed by PointNet [ref]); RGB image (processed by a 2D CNN) | R-CNN | A 3D object detector for RGB image | After RP | Using RP from RGB image detector to search LiDAR point clouds | Late | KITTI |
Chen et al., 2017 [pdf][ref] | LiDAR, vision camera | 3D Car | LiDAR BEV and spherical maps, RGB image. Each processed by a base network built on VGG16 | Faster-RCNN | A RPN from LiDAR BEV map | After RP | average mean, deep fusion | Early, Middle, Late | KITTI |
Wang et al., 2017 [pdf][ref] | LiDAR, vision camera | 3D Car, Pedestrian | LiDAR BEV map, RGB image. Each processed by a RetinaNet [ref] | One stage detector | Fused LiDAR and RGB image features extracted from CNN | Before RP | Sparse mean manipulation | Middle | KITTI |
Ku et al., 2017 [pdf][ref] | LiDAR, vision camera | 3D Car, Pedestrian, Cyclist | LiDAR BEV map, RGB image. Each processed by VGG16 | Faster-RCNN | Fused LiDAR and RGB image features extracted from CNN | Before and after RP | Average mean | Early, Middle, Late | KITTI |
Xu et al., 2017 [pdf][ref] | LiDAR, vision camera | 3D Car, Pedestrian, Cyclist, Indoor objects | LiDAR points (processed by PointNet), RGB image (processed by ResNet) | R-CNN | Pre-trained RGB image detector | After RP | Feature concatenation for local and global features | Middle | KITTI, SUN-RGBD |
Qi et al., 2017 [pdf][ref] | LiDAR, vision camera | 3D Car, Pedestrian, Cyclist, Indoor objects | LiDAR points (processed by PointNet), RGB image (using a pre-trained detector) | R-CNN | Pre-trained RGB image detector | After RP | Feature concatenation | Middle, Late | KITTI, SUN-RGBD |