Skip to the content.

Detection 3D

Back to index     Back to Detection     2D   3D   Thermal   LiDAR   Radar

Reference Sensors Object Type Sensing Modality Representations and Processing Network Pipeline How to generate Region Proposals (RP) When to fuse Fusion Operation and Method Fusion Level Dataset(s) used
Meyer and Kuschk, 2019 [pdf][ref] Radar, visual camera 3D Vehicle Radar pointcloud, RGB image. Fused features extracted from CNN. Faster R-CNN Before and after RP Average mean Region proposal Early, Middle Astyx HiRes2019
Liang et al., 2019 [pdf][ref] LiDAR, visual camera3D Car, Pedestrian, Cyclist LiDAR BEV maps, RGB image. Each processed by a ResNet with auxiliary tasks: depth estimation and ground segmentationFaster R-CNNPredictions with fused featuresBefore RPAddition, continuous fusion layerMiddleKITTI, self-recorded
Wang et al., 2019 [pdf][ref] LiDAR, visual camera3D Car, Pedestrian, Cyclist, Indoor objectsLiDAR voxelized frustum (each frustum processed by the PointNet), RGB image (using a pre-trained detector).R-CNNPre-trained RGB image detectorAfter RPUsing RP from RGB image detector to build LiDAR frustumsLateKITTI, SUN-RGBD
Dou et al., 2019 [pdf][ref] LiDAR, visual camera3D CarLiDAR voxel (processed by VoxelNet), RGB image (processed by a FCN to get semantic features)Two stage detectorPredictions with fused featuresBefore RPFeature concatenationMiddleKITTI
Sindagi et al., 2019 [pdf][ref] LiDAR, visual camera3D CarLiDAR voxel (processed by VoxelNet), RGB image (processed by a pre-trained 2D image detector).One stage detectorPredictions with fused featuresBefore RPFeature concatenationEarly, MiddleKITTI
Liang et al., 2018 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist LiDAR BEV maps, RGB image. Each processed by ResNet One stage detector Predictions with fused features. Before RP Addition, continuous fusion layer Middle KITTI, self-recorded
Du et al., 2018 [pdf][ref] LiDAR, vision camera 3D Car LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) R-CNN Pre-trained RGB image detector produces 2D bounding boxes to crop LiDAR points, which are then clustered Before and at RP Ensemble: use RGB image detector to regress car dimensions for a model fitting algorithm. Late KITTI, self-recorded data
Yang et al., 2018 [pdf][ref] LiDAR, HD-map 3D Car LiDAR BEV maps, Road mask image from HD map. Inputs processed by PIXOR++ [ref] with the backbone similar to FPN One stage detector Detector predictions Before RP Feature concatenation Early KITTI, TOR4D Dataset~[ref]
Casas et al., 2018 [pdf][ref] LiDAR, HD-map 3D Car sequential LiDAR BEV maps, sequential several road topology mask images from HD map. Each input processed by a base network with residual blocks One stage detector Detector predictions Before RP Feature concatenation Middle self-recorded data
Shin et al., 2018 [pdf][ref] LiDAR, vision camera 3D Car LiDAR point clouds, (processed by PointNet [ref]); RGB image (processed by a 2D CNN) R-CNN A 3D object detector for RGB image After RP Using RP from RGB image detector to search LiDAR point clouds Late KITTI
Chen et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car LiDAR BEV and spherical maps, RGB image. Each processed by a base network built on VGG16 Faster-RCNN A RPN from LiDAR BEV map After RP average mean, deep fusion Early, Middle, Late KITTI
Wang et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian LiDAR BEV map, RGB image. Each processed by a RetinaNet [ref] One stage detector Fused LiDAR and RGB image features extracted from CNN Before RP Sparse mean manipulation Middle KITTI
Ku et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist LiDAR BEV map, RGB image. Each processed by VGG16 Faster-RCNN Fused LiDAR and RGB image features extracted from CNN Before and after RP Average mean Early, Middle, Late KITTI
Xu et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist, Indoor objects LiDAR points (processed by PointNet), RGB image (processed by ResNet) R-CNN Pre-trained RGB image detector After RP Feature concatenation for local and global features Middle KITTI, SUN-RGBD
Qi et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist, Indoor objects LiDAR points (processed by PointNet), RGB image (using a pre-trained detector) R-CNN Pre-trained RGB image detector After RP Feature concatenation Middle, Late KITTI, SUN-RGBD