Skip to the content.

Detection LiDAR

Back to index     Back to Detection     2D   3D   Thermal   LiDAR   Radar

Reference Sensors Object Type Sensing Modality Representations and Processing Network Pipeline How to generate Region Proposals (RP) When to fuse Fusion Operation and Method Fusion Level Dataset(s) used
Liang et al., 2019 [pdf][ref] LiDAR, visual camera3D Car, Pedestrian, Cyclist LiDAR BEV maps, RGB image. Each processed by a ResNet with auxiliary tasks: depth estimation and ground segmentationFaster R-CNNPredictions with fused featuresBefore RPAddition, continuous fusion layerMiddleKITTI, self-recorded
Wang et al., 2019 [pdf][ref] LiDAR, visual camera3D Car, Pedestrian, Cyclist, Indoor objectsLiDAR voxelized frustum (each frustum processed by the PointNet), RGB image (using a pre-trained detector).R-CNNPre-trained RGB image detectorAfter RPUsing RP from RGB image detector to build LiDAR frustumsLateKITTI, SUN-RGBD
Dou et al., 2019 [pdf][ref] LiDAR, visual camera3D CarLiDAR voxel (processed by VoxelNet), RGB image (processed by a FCN to get semantic features)Two stage detectorPredictions with fused featuresBefore RPFeature concatenationMiddleKITTI
Sindagi et al., 2019 [pdf][ref] LiDAR, visual camera3D CarLiDAR voxel (processed by VoxelNet), RGB image (processed by a pre-trained 2D image detector).One stage detectorPredictions with fused featuresBefore RPFeature concatenationEarly, MiddleKITTI
Bijelic et al., 2019 [pdf][ref] LiDAR, visual camera 2D Car in foggy weather Lidar front view images (depth, intensity, height), RGB image. Each processed by VGG16 SSD Predictions with fused features Before RP Feature concatenation From early to middle layers Self-recorded datasets focused on foggy weather, simulated foggy images from KITTI
Pfeuffer et al., 2018 [pdf][ref] LiDAR, vision camera Multiple 2D objects LiDAR spherical, and front-view sparse depth, dense depth image, RGB image. Each processed by VGG16 Faster-RCNN RPN from fused features Before RP Feature concatenation Early, Middle, Late KITTI
Liang et al., 2018 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist LiDAR BEV maps, RGB image. Each processed by ResNet One stage detector Predictions with fused features. Before RP Addition, continuous fusion layer Middle KITTI, self-recorded
Du et al., 2018 [pdf][ref] LiDAR, vision camera 3D Car LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) R-CNN Pre-trained RGB image detector produces 2D bounding boxes to crop LiDAR points, which are then clustered Before and at RP Ensemble: use RGB image detector to regress car dimensions for a model fitting algorithm. Late KITTI, self-recorded data
Kim et al., 2018 [pdf][ref] LiDAR, vision camera 2D Car LiDAR front-view depth image, RGB image. Each input processed by VGG16 SSD SSD with fused features Before RP Feature concatenation, Mixture of Experts Middle KITTI
Yang et al., 2018 [pdf][ref] LiDAR, HD-map 3D Car LiDAR BEV maps, Road mask image from HD map. Inputs processed by PIXOR++ [ref] with the backbone similar to FPN One stage detector Detector predictions Before RP Feature concatenation Early KITTI, TOR4D Dataset~[ref]
Casas et al., 2018 [pdf][ref] LiDAR, HD-map 3D Car sequential LiDAR BEV maps, sequential several road topology mask images from HD map. Each input processed by a base network with residual blocks One stage detector Detector predictions Before RP Feature concatenation Middle self-recorded data
Shin et al., 2018 [pdf][ref] LiDAR, vision camera 3D Car LiDAR point clouds, (processed by PointNet [ref]); RGB image (processed by a 2D CNN) R-CNN A 3D object detector for RGB image After RP Using RP from RGB image detector to search LiDAR point clouds Late KITTI
Chen et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car LiDAR BEV and spherical maps, RGB image. Each processed by a base network built on VGG16 Faster-RCNN A RPN from LiDAR BEV map After RP average mean, deep fusion Early, Middle, Late KITTI
Asvadi et al., 2017 [pdf][ref] LiDAR, vision camera 2D Car LiDAR front-view dense-depth (DM) and reflectance maps (RM), RGB image. Each processed through a YOLO net YOLO YOLO outputs for LiDAR DM and RM maps, and RGB image After RP Ensemble: feed engineered features from ensembled bounding boxes to a network to predict scores for NMS Late KITTI
Oh et al., 2017 [pdf][ref] LiDAR, vision camera 2D Car, Pedestrian, Cyclist LiDAR front-view dense-depth map (for fusion: processed by VGG16), LiDAR voxel (for ROIs: segmentation and region growing), RGB image (for fusion: processed by VGG16; for ROIs: segmentation and grouping) R-CNN LiDAR voxel and RGB image separately After RP Association matrix using basic belief assignment Late KITTI
Wang et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian LiDAR BEV map, RGB image. Each processed by a RetinaNet [ref] One stage detector Fused LiDAR and RGB image features extracted from CNN Before RP Sparse mean manipulation Middle KITTI
Ku et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist LiDAR BEV map, RGB image. Each processed by VGG16 Faster-RCNN Fused LiDAR and RGB image features extracted from CNN Before and after RP Average mean Early, Middle, Late KITTI
Xu et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist, Indoor objects LiDAR points (processed by PointNet), RGB image (processed by ResNet) R-CNN Pre-trained RGB image detector After RP Feature concatenation for local and global features Middle KITTI, SUN-RGBD
Qi et al., 2017 [pdf][ref] LiDAR, vision camera 3D Car, Pedestrian, Cyclist, Indoor objects LiDAR points (processed by PointNet), RGB image (using a pre-trained detector) R-CNN Pre-trained RGB image detector After RP Feature concatenation Middle, Late KITTI, SUN-RGBD
Du et al., 2017 [pdf][ref] LiDAR, vision camera 2D Car LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) Faster-RCNN First clustered by LiDAR point clouds, then fine-tuned by a RPN of RGB image Before RP Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction Late KITTI
Matti et al., 2017 [pdf][ref] LiDAR, vision camera 2D Pedestrian LiDAR points (clustering with DBSCAN) and RGB image (processed by ResNet) R-CNN Clustered by LiDAR point clouds, then size and ratio corrected on RGB image. Before and at RP Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction Late KITTI
Schlosser et al., 2016 [pdf][ref] LiDAR, vision camera 2D Pedestrian LiDAR HHA image, RGB image. Each processed by a small ConvNet R-CNN Deformable Parts Model with RGB image After RP Feature concatenation Early, Middle, Late KITTI
Kim et al., 2016 [pdf][ref] LiDAR, vision camera 2D Pedestrian, Cyclist LiDAR front-view depth image, RGB image. Each processed by Fast-RCNN network [ref] Fast-RCNN Selective search for LiDAR and RGB image separately. At RP Ensemble: joint RP are fed to RGB image based CNN. Late KITTI