Skip to the content.

Detection

Back to index     2D   3D   Thermal   LiDAR   Radar

Reference Sensors Object Type Sensing Modality Representations and Processing Network Pipeline How to generate Region Proposals (RP) When to fuse Fusion Operation and Method Fusion Level Dataset(s) used
Meyer and Kuschk, 2019 [pdf][ref] Radar, visual camera 3D Vehicle Radar pointcloud, RGB image. Fused features extracted from CNN. Faster R-CNN Before and after RP Average mean Region proposal Early, Middle Astyx HiRes2019
Nabati et al., 2019 [pdf][ref] Radar, visual camera 2D Vehicle Radar object, RGB image. Radar projected to image frame. Fast R-CNN Radar used to generate region proposal Implicit at RP Region proposal Middle nuScenes
Liang et al., 2019 [pdf][ref] LiDAR, visual camera3D Car, Pedestrian, Cyclist LiDAR BEV maps, RGB image. Each processed by a ResNet with auxiliary tasks: depth estimation and ground segmentationFaster R-CNNPredictions with fused featuresBefore RPAddition, continuous fusion layerMiddleKITTI, self-recorded
Wang et al., 2019 [pdf][ref] LiDAR, visual camera3D Car, Pedestrian, Cyclist, Indoor objectsLiDAR voxelized frustum (each frustum processed by the PointNet), RGB image (using a pre-trained detector).R-CNNPre-trained RGB image detectorAfter RPUsing RP from RGB image detector to build LiDAR frustumsLateKITTI, SUN-RGBD
Dou et al., 2019 [pdf][ref] LiDAR, visual camera3D CarLiDAR voxel (processed by VoxelNet), RGB image (processed by a FCN to get semantic features)Two stage detectorPredictions with fused featuresBefore RPFeature concatenationMiddleKITTI
Sindagi et al., 2019 [pdf][ref] LiDAR, visual camera3D CarLiDAR voxel (processed by VoxelNet), RGB image (processed by a pre-trained 2D image detector).One stage detectorPredictions with fused featuresBefore RPFeature concatenationEarly, MiddleKITTI
Bijelic et al., 2019 [pdf][ref] LiDAR, visual camera 2D Car in foggy weather Lidar front view images (depth, intensity, height), RGB image. Each processed by VGG16 SSD Predictions with fused features Before RP Feature concatenation From early to middle layers Self-recorded datasets focused on foggy weather, simulated foggy images from KITTI
Chadwick et al., 2019 [pdf][ref] Radar, visual camera 2D Vehicle Radar range and velocity maps, RGB image. Each processed by ResNet One stage detector Predictions with fused features Before RP Addition, feature concatenation Middle Self-recorded
Pfeuffer et al., 2018 [pdf][ref] LiDAR, visual camera Multiple 2D objects LiDAR spherical, and front-view sparse depth, dense depth image, RGB image. Each processed by VGG16 Faster-RCNN RPN from fused features Before RP Feature concatenation Early, Middle, Late KITTI
Liang et al., 2018 [pdf][ref] LiDAR, visual camera 3D Car, Pedestrian, Cyclist LiDAR BEV maps, RGB image. Each processed by ResNet One stage detector Predictions with fused features. Before RP Addition, continuous fusion layer Middle KITTI, self-recorded
Du et al., 2018 [pdf][ref] LiDAR, visual camera 3D Car LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) R-CNN Pre-trained RGB image detector produces 2D bounding boxes to crop LiDAR points, which are then clustered Before and at RP Ensemble: use RGB image detector to regress car dimensions for a model fitting algorithm. Late KITTI, self-recorded data
Kim et al., 2018 [pdf][ref] LiDAR, visual camera 2D Car LiDAR front-view depth image, RGB image. Each input processed by VGG16 SSD SSD with fused features Before RP Feature concatenation, Mixture of Experts Middle KITTI
Yang et al., 2018 [pdf][ref] LiDAR, HD-map 3D Car LiDAR BEV maps, Road mask image from HD map. Inputs processed by PIXOR++ [ref] with the backbone similar to FPN One stage detector Detector predictions Before RP Feature concatenation Early KITTI, TOR4D Dataset [ref]
Casas et al., 2018 [pdf][ref] LiDAR, HD-map 3D Car sequential LiDAR BEV maps, sequential several road topology mask images from HD map. Each input processed by a base network with residual blocks One stage detector Detector predictions Before RP Feature concatenation Middle self-recorded data
Guan et al., 2018 [pdf][ref] visual camera, thermal camera 2D Pedestrian RGB image, thermal image. Each processed by a base network built on VGG16 Faster-RCNN RPN with fused features Before and after RP Feature concatenation, Mixture of Experts Early, Middle, Late KAIST Pedestrian Dataset
Shin et al., 2018 [pdf][ref] LiDAR, visual camera 3D Car LiDAR point clouds, (processed by PointNet [ref]); RGB image (processed by a 2D CNN) R-CNN A 3D object detector for RGB image After RP Using RP from RGB image detector to search LiDAR point clouds Late KITTI
Chen et al., 2017 [pdf][ref] LiDAR, visual camera 3D Car LiDAR BEV and spherical maps, RGB image. Each processed by a base network built on VGG16 Faster-RCNN A RPN from LiDAR BEV map After RP average mean, deep fusion Early, Middle, Late KITTI
Asvadi et al., 2017 [pdf][ref] LiDAR, visual camera 2D Car LiDAR front-view dense-depth (DM) and reflectance maps (RM), RGB image. Each processed through a YOLO net YOLO YOLO outputs for LiDAR DM and RM maps, and RGB image After RP Ensemble: feed engineered features from ensembled bounding boxes to a network to predict scores for NMS Late KITTI
Oh et al., 2017 [pdf][ref] LiDAR, visual camera 2D Car, Pedestrian, Cyclist LiDAR front-view dense-depth map (for fusion: processed by VGG16), LiDAR voxel (for ROIs: segmentation and region growing), RGB image (for fusion: processed by VGG16; for ROIs: segmentation and grouping) R-CNN LiDAR voxel and RGB image separately After RP Association matrix using basic belief assignment Late KITTI
Wang et al., 2017 [pdf][ref] LiDAR, visual camera 3D Car, Pedestrian LiDAR BEV map, RGB image. Each processed by a RetinaNet [ref] One stage detector Fused LiDAR and RGB image features extracted from CNN Before RP Sparse mean manipulation Middle KITTI
Ku et al., 2017 [pdf][ref] LiDAR, visual camera 3D Car, Pedestrian, Cyclist LiDAR BEV map, RGB image. Each processed by VGG16 Faster-RCNN Fused LiDAR and RGB image features extracted from CNN Before and after RP Average mean Early, Middle, Late KITTI
Xu et al., 2017 [pdf][ref] LiDAR, visual camera 3D Car, Pedestrian, Cyclist, Indoor objects LiDAR points (processed by PointNet), RGB image (processed by ResNet) R-CNN Pre-trained RGB image detector After RP Feature concatenation for local and global features Middle KITTI, SUN-RGBD
Qi et al., 2017 [pdf][ref] LiDAR, visual camera 3D Car, Pedestrian, Cyclist, Indoor objects LiDAR points (processed by PointNet), RGB image (using a pre-trained detector) R-CNN Pre-trained RGB image detector After RP Feature concatenation Middle, Late KITTI, SUN-RGBD
Du et al., 2017 [pdf][ref] LiDAR, visual camera 2D Car LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) Faster-RCNN First clustered by LiDAR point clouds, then fine-tuned by a RPN of RGB image Before RP Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction Late KITTI
Schneider et al., 2017 [pdf][ref] visual camera Multiple 2D objects RGB image (processed by GoogLeNet), depth image from stereo camera (processed by NiN net) SSD SSD predictions. Before RP Feature concatenation Early, Middle, Late Cityscape
Takumi et al., 2017 [pdf][ref] visual camera, thermal camera Multiple 2D objects RGB image, NIR, FIR, FIR image. Each processed by YOLO YOLO YOLO predictions for each spectral image After RP Ensemble: ensemble final predictions for each YOLO detector Late self-recorded data
Matti et al., 2017 [pdf][ref] LiDAR, visual camera 2D Pedestrian LiDAR points (clustering with DBSCAN) and RGB image (processed by ResNet) R-CNN Clustered by LiDAR point clouds, then size and ratio corrected on RGB image. Before and at RP Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction Late KITTI
Schlosser et al., 2016 [pdf][ref] LiDAR, visual camera 2D Pedestrian LiDAR HHA image, RGB image. Each processed by a small ConvNet R-CNN Deformable Parts Model with RGB image After RP Feature concatenation Early, Middle, Late KITTI
Kim et al., 2016 [pdf][ref] LiDAR, visual camera 2D Pedestrian, Cyclist LiDAR front-view depth image, RGB image. Each processed by Fast-RCNN network [ref] Fast-RCNN Selective search for LiDAR and RGB image separately. At RP Ensemble: joint RP are fed to RGB image based CNN. Late KITTI
Mees et al., 2016 [pdf][ref] RGB-D camera 2D Pedestrian RGB image, depth image from depth camera, optical flow. Each processed by GoogLeNet Fast-RCNN Dense multi-scale sliding window for RGB image After RP Mixture of Experts Late RGB-D People Unihall Dataset, InOutDoor RGB-D People Dataset.
Wagner et al., 2016 [pdf][ref] visual camera, thermal camera 2D Pedestrian RGB image, thermal image. Each processed by CaffeeNet R-CNN ACF+T+THOG detector After RP Feature concatenation Early, Late KAIST Pedestrian Dataset
Liu et al., 2016 [pdf][ref] visual camera, thermal camera 2D Pedestrian RGB image, thermal image. Each processed by NiN network Faster-RCNN RPN with fused (or separate) features Before and after RP Feature concatenation, average mean, Score fusion (Cascaded CNN) Early, Middle, Late KAIST Pedestrian Dataset