Detection 2D

Back to index Back to Detection

Reference	Sensors	Object Type	Sensing Modality Representations and Processing	Network Pipeline	How to generate Region Proposals (RP)	When to fuse	Fusion Operation and Method	Fusion Level	Dataset(s) used
Nabati et al., 2019 [pdf][ref]	Radar, visual camera	2D Vehicle	Radar object, RGB image. Radar projected to image frame.	Fast R-CNN	Radar used to generate region proposal	Implicit at RP	Region proposal	Middle	nuScenes
Bijelic et al., 2019 [pdf][ref]	LiDAR, visual camera	2D Car in foggy weather	Lidar front view images (depth, intensity, height), RGB image. Each processed by VGG16	SSD	Predictions with fused features	Before RP	Feature concatenation	From early to middle layers	Self-recorded datasets focused on foggy weather, simulated foggy images from KITTI
Chadwick et al., 2019 [pdf][ref]	Radar, visual camera	2D Vehicle	Radar range and velocity maps, RGB image. Each processed by ResNet	One stage detector	Predictions with fused features	Before RP	Addition, feature concatenation	Middle	Self-recorded
Pfeuffer et al., 2018 [pdf][ref]	LiDAR, vision camera	Multiple 2D objects	LiDAR spherical, and front-view sparse depth, dense depth image, RGB image. Each processed by VGG16	Faster-RCNN	RPN from fused features	Before RP	Feature concatenation	Early, Middle, Late	KITTI
Kim et al., 2018 [pdf][ref]	LiDAR, vision camera	2D Car	LiDAR front-view depth image, RGB image. Each input processed by VGG16	SSD	SSD with fused features	Before RP	Feature concatenation, Mixture of Experts	Middle	KITTI
Guan et al., 2018 [pdf][ref]	Vision camera, thermal camera	2D Pedestrian	RGB image, thermal image. Each processed by a base network built on VGG16	Faster-RCNN	RPN with fused features	Before and after RP	Feature concatenation, Mixture of Experts	Early, Middle, Late	KAIST Pedestrian Dataset
Asvadi et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Car	LiDAR front-view dense-depth (DM) and reflectance maps (RM), RGB image. Each processed through a YOLO net	YOLO	YOLO outputs for LiDAR DM and RM maps, and RGB image	After RP	Ensemble: feed engineered features from ensembled bounding boxes to a network to predict scores for NMS	Late	KITTI
Oh et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Car, Pedestrian, Cyclist	LiDAR front-view dense-depth map (for fusion: processed by VGG16), LiDAR voxel (for ROIs: segmentation and region growing), RGB image (for fusion: processed by VGG16; for ROIs: segmentation and grouping)	R-CNN	LiDAR voxel and RGB image separately	After RP	Association matrix using basic belief assignment	Late	KITTI
Du et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Car	LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet)	Faster-RCNN	First clustered by LiDAR point clouds, then fine-tuned by a RPN of RGB image	Before RP	Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction	Late	KITTI
Schneider et al., 2017 [pdf][ref]	Vision camera	Multiple 2D objects	RGB image (processed by GoogLeNet), depth image from stereo camera (processed by NiN net)	SSD	SSD predictions.	Before RP	Feature concatenation	Early, Middle, Late	Cityscape
Takumi et al., 2017 [pdf][ref]	Vision camera, thermal camera	Multiple 2D objects	RGB image, NIR, FIR, FIR image. Each processed by YOLO	YOLO	YOLO predictions for each spectral image	After RP	Ensemble: ensemble final predictions for each YOLO detector	Late	self-recorded data
Matti et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Pedestrian	LiDAR points (clustering with DBSCAN) and RGB image (processed by ResNet)	R-CNN	Clustered by LiDAR point clouds, then size and ratio corrected on RGB image.	Before and at RP	Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction	Late	KITTI
Schlosser et al., 2016 [pdf][ref]	LiDAR, vision camera	2D Pedestrian	LiDAR HHA image, RGB image. Each processed by a small ConvNet	R-CNN	Deformable Parts Model with RGB image	After RP	Feature concatenation	Early, Middle, Late	KITTI
Kim et al., 2016 [pdf][ref]	LiDAR, vision camera	2D Pedestrian, Cyclist	LiDAR front-view depth image, RGB image. Each processed by Fast-RCNN network [ref]	Fast-RCNN	Selective search for LiDAR and RGB image separately.	At RP	Ensemble: joint RP are fed to RGB image based CNN.	Late	KITTI
Mees et al., 2016 [pdf][ref]	RGB-D camera	2D Pedestrian	RGB image, depth image from depth camera, optical flow. Each processed by GoogLeNet	Fast-RCNN	Dense multi-scale sliding window for RGB image	After RP	Mixture of Experts	Late	RGB-D People Unihall Dataset, InOutDoor RGB-D People Dataset.
Wagner et al., 2016 [pdf][ref]	Vision camera, thermal camera	2D Pedestrian	RGB image, thermal image. Each processed by CaffeeNet	R-CNN	ACF+T+THOG detector	After RP	Feature concatenation	Early, Late	KAIST Pedestrian Dataset
Liu et al., 2016 [pdf][ref]	Vision camera, thermal camera	2D Pedestrian	RGB image, thermal image. Each processed by NiN network	Faster-RCNN	RPN with fused (or separate) features	Before and after RP	Feature concatenation, average mean, Score fusion (Cascaded CNN)	Early, Middle, Late	KAIST Pedestrian Dataset