Detection 2D
Back to index Back to Detection
Reference | Sensors | Object Type | Sensing Modality Representations and Processing | Network Pipeline | How to generate Region Proposals (RP) | When to fuse | Fusion Operation and Method | Fusion Level | Dataset(s) used |
---|---|---|---|---|---|---|---|---|---|
Nabati et al., 2019 [pdf][ref] | Radar, visual camera | 2D Vehicle | Radar object, RGB image. Radar projected to image frame. | Fast R-CNN | Radar used to generate region proposal | Implicit at RP | Region proposal | Middle | nuScenes |
Bijelic et al., 2019 [pdf][ref] | LiDAR, visual camera | 2D Car in foggy weather | Lidar front view images (depth, intensity, height), RGB image. Each processed by VGG16 | SSD | Predictions with fused features | Before RP | Feature concatenation | From early to middle layers | Self-recorded datasets focused on foggy weather, simulated foggy images from KITTI |
Chadwick et al., 2019 [pdf][ref] | Radar, visual camera | 2D Vehicle | Radar range and velocity maps, RGB image. Each processed by ResNet | One stage detector | Predictions with fused features | Before RP | Addition, feature concatenation | Middle | Self-recorded |
Pfeuffer et al., 2018 [pdf][ref] | LiDAR, vision camera | Multiple 2D objects | LiDAR spherical, and front-view sparse depth, dense depth image, RGB image. Each processed by VGG16 | Faster-RCNN | RPN from fused features | Before RP | Feature concatenation | Early, Middle, Late | KITTI |
Kim et al., 2018 [pdf][ref] | LiDAR, vision camera | 2D Car | LiDAR front-view depth image, RGB image. Each input processed by VGG16 | SSD | SSD with fused features | Before RP | Feature concatenation, Mixture of Experts | Middle | KITTI |
Guan et al., 2018 [pdf][ref] | Vision camera, thermal camera | 2D Pedestrian | RGB image, thermal image. Each processed by a base network built on VGG16 | Faster-RCNN | RPN with fused features | Before and after RP | Feature concatenation, Mixture of Experts | Early, Middle, Late | KAIST Pedestrian Dataset |
Asvadi et al., 2017 [pdf][ref] | LiDAR, vision camera | 2D Car | LiDAR front-view dense-depth (DM) and reflectance maps (RM), RGB image. Each processed through a YOLO net | YOLO | YOLO outputs for LiDAR DM and RM maps, and RGB image | After RP | Ensemble: feed engineered features from ensembled bounding boxes to a network to predict scores for NMS | Late | KITTI |
Oh et al., 2017 [pdf][ref] | LiDAR, vision camera | 2D Car, Pedestrian, Cyclist | LiDAR front-view dense-depth map (for fusion: processed by VGG16), LiDAR voxel (for ROIs: segmentation and region growing), RGB image (for fusion: processed by VGG16; for ROIs: segmentation and grouping) | R-CNN | LiDAR voxel and RGB image separately | After RP | Association matrix using basic belief assignment | Late | KITTI |
Du et al., 2017 [pdf][ref] | LiDAR, vision camera | 2D Car | LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet) | Faster-RCNN | First clustered by LiDAR point clouds, then fine-tuned by a RPN of RGB image | Before RP | Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction | Late | KITTI |
Schneider et al., 2017 [pdf][ref] | Vision camera | Multiple 2D objects | RGB image (processed by GoogLeNet), depth image from stereo camera (processed by NiN net) | SSD | SSD predictions. | Before RP | Feature concatenation | Early, Middle, Late | Cityscape |
Takumi et al., 2017 [pdf][ref] | Vision camera, thermal camera | Multiple 2D objects | RGB image, NIR, FIR, FIR image. Each processed by YOLO | YOLO | YOLO predictions for each spectral image | After RP | Ensemble: ensemble final predictions for each YOLO detector | Late | self-recorded data |
Matti et al., 2017 [pdf][ref] | LiDAR, vision camera | 2D Pedestrian | LiDAR points (clustering with DBSCAN) and RGB image (processed by ResNet) | R-CNN | Clustered by LiDAR point clouds, then size and ratio corrected on RGB image. | Before and at RP | Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction | Late | KITTI |
Schlosser et al., 2016 [pdf][ref] | LiDAR, vision camera | 2D Pedestrian | LiDAR HHA image, RGB image. Each processed by a small ConvNet | R-CNN | Deformable Parts Model with RGB image | After RP | Feature concatenation | Early, Middle, Late | KITTI |
Kim et al., 2016 [pdf][ref] | LiDAR, vision camera | 2D Pedestrian, Cyclist | LiDAR front-view depth image, RGB image. Each processed by Fast-RCNN network [ref] | Fast-RCNN | Selective search for LiDAR and RGB image separately. | At RP | Ensemble: joint RP are fed to RGB image based CNN. | Late | KITTI |
Mees et al., 2016 [pdf][ref] | RGB-D camera | 2D Pedestrian | RGB image, depth image from depth camera, optical flow. Each processed by GoogLeNet | Fast-RCNN | Dense multi-scale sliding window for RGB image | After RP | Mixture of Experts | Late | RGB-D People Unihall Dataset, InOutDoor RGB-D People Dataset. |
Wagner et al., 2016 [pdf][ref] | Vision camera, thermal camera | 2D Pedestrian | RGB image, thermal image. Each processed by CaffeeNet | R-CNN | ACF+T+THOG detector | After RP | Feature concatenation | Early, Late | KAIST Pedestrian Dataset |
Liu et al., 2016 [pdf][ref] | Vision camera, thermal camera | 2D Pedestrian | RGB image, thermal image. Each processed by NiN network | Faster-RCNN | RPN with fused (or separate) features | Before and after RP | Feature concatenation, average mean, Score fusion (Cascaded CNN) | Early, Middle, Late | KAIST Pedestrian Dataset |