Skip to the content.

Detection 2D

Back to index     Back to Detection     2D   3D   Thermal   LiDAR   Radar

Reference Sensors Object Type Sensing Modality Representations and Processing Network Pipeline How to generate Region Proposals (RP) When to fuse Fusion Operation and Method Fusion Level Dataset(s) used
Nabati et al., 2019 [pdf][ref] Radar, visual camera 2D Vehicle Radar object, RGB image. Radar projected to image frame. Fast R-CNN Radar used to generate region proposal Implicit at RP Region proposal Middle nuScenes
Bijelic et al., 2019 [pdf][ref] LiDAR, visual camera 2D Car in foggy weather Lidar front view images (depth, intensity, height), RGB image. Each processed by VGG16 SSD Predictions with fused features Before RP Feature concatenation From early to middle layers Self-recorded datasets focused on foggy weather, simulated foggy images from KITTI
Chadwick et al., 2019 [pdf][ref] Radar, visual camera 2D Vehicle Radar range and velocity maps, RGB image. Each processed by ResNet One stage detector Predictions with fused features Before RP Addition, feature concatenation Middle Self-recorded
Pfeuffer et al., 2018 [pdf][ref] LiDAR, vision camera Multiple 2D objects LiDAR spherical, and front-view sparse depth, dense depth image, RGB image. Each processed by VGG16 Faster-RCNN RPN from fused features Before RP Feature concatenation Early, Middle, Late KITTI
Kim et al., 2018 [pdf][ref] LiDAR, vision camera 2D Car LiDAR front-view depth image, RGB image. Each input processed by VGG16 SSD SSD with fused features Before RP Feature concatenation, Mixture of Experts Middle KITTI
Guan et al., 2018 [pdf][ref] Vision camera, thermal camera 2D Pedestrian RGB image, thermal image. Each processed by a base network built on VGG16 Faster-RCNN RPN with fused features Before and after RP Feature concatenation, Mixture of Experts Early, Middle, Late KAIST Pedestrian Dataset
Asvadi et al., 2017 [pdf][ref] LiDAR, vision camera 2D Car LiDAR front-view dense-depth (DM) and reflectance maps (RM), RGB image. Each processed through a YOLO net YOLO YOLO outputs for LiDAR DM and RM maps, and RGB image After RP Ensemble: feed engineered features from ensembled bounding boxes to a network to predict scores for NMS Late KITTI
Oh et al., 2017 [pdf][ref] LiDAR, vision camera 2D Car, Pedestrian, Cyclist LiDAR front-view dense-depth map (for fusion: processed by VGG16), LiDAR voxel (for ROIs: segmentation and region growing), RGB image (for fusion: processed by VGG16; for ROIs: segmentation and grouping) R-CNN LiDAR voxel and RGB image separately After RP Association matrix using basic belief assignment Late KITTI
Du et al., 2017 [pdf][ref] LiDAR, vision camera 2D Car LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) Faster-RCNN First clustered by LiDAR point clouds, then fine-tuned by a RPN of RGB image Before RP Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction Late KITTI
Schneider et al., 2017 [pdf][ref] Vision camera Multiple 2D objects RGB image (processed by GoogLeNet), depth image from stereo camera (processed by NiN net) SSD SSD predictions. Before RP Feature concatenation Early, Middle, Late Cityscape
Takumi et al., 2017 [pdf][ref] Vision camera, thermal camera Multiple 2D objects RGB image, NIR, FIR, FIR image. Each processed by YOLO YOLO YOLO predictions for each spectral image After RP Ensemble: ensemble final predictions for each YOLO detector Late self-recorded data
Matti et al., 2017 [pdf][ref] LiDAR, vision camera 2D Pedestrian LiDAR points (clustering with DBSCAN) and RGB image (processed by ResNet) R-CNN Clustered by LiDAR point clouds, then size and ratio corrected on RGB image. Before and at RP Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction Late KITTI
Schlosser et al., 2016 [pdf][ref] LiDAR, vision camera 2D Pedestrian LiDAR HHA image, RGB image. Each processed by a small ConvNet R-CNN Deformable Parts Model with RGB image After RP Feature concatenation Early, Middle, Late KITTI
Kim et al., 2016 [pdf][ref] LiDAR, vision camera 2D Pedestrian, Cyclist LiDAR front-view depth image, RGB image. Each processed by Fast-RCNN network [ref] Fast-RCNN Selective search for LiDAR and RGB image separately. At RP Ensemble: joint RP are fed to RGB image based CNN. Late KITTI
Mees et al., 2016 [pdf][ref] RGB-D camera 2D Pedestrian RGB image, depth image from depth camera, optical flow. Each processed by GoogLeNet Fast-RCNN Dense multi-scale sliding window for RGB image After RP Mixture of Experts Late RGB-D People Unihall Dataset, InOutDoor RGB-D People Dataset.
Wagner et al., 2016 [pdf][ref] Vision camera, thermal camera 2D Pedestrian RGB image, thermal image. Each processed by CaffeeNet R-CNN ACF+T+THOG detector After RP Feature concatenation Early, Late KAIST Pedestrian Dataset
Liu et al., 2016 [pdf][ref] Vision camera, thermal camera 2D Pedestrian RGB image, thermal image. Each processed by NiN network Faster-RCNN RPN with fused (or separate) features Before and after RP Feature concatenation, average mean, Score fusion (Cascaded CNN) Early, Middle, Late KAIST Pedestrian Dataset