Detection 3D

Back to index Back to Detection

Reference	Sensors	Object Type	Sensing Modality Representations and Processing	Network Pipeline	How to generate Region Proposals (RP)	When to fuse	Fusion Operation and Method	Fusion Level	Dataset(s) used
Meyer and Kuschk, 2019 [pdf][ref]	Radar, visual camera	3D Vehicle	Radar pointcloud, RGB image. Fused features extracted from CNN.	Faster R-CNN	Before and after RP	Average mean	Region proposal	Early, Middle	Astyx HiRes2019
Liang et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car, Pedestrian, Cyclist	LiDAR BEV maps, RGB image. Each processed by a ResNet with auxiliary tasks: depth estimation and ground segmentation	Faster R-CNN	Predictions with fused features	Before RP	Addition, continuous fusion layer	Middle	KITTI, self-recorded
Wang et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car, Pedestrian, Cyclist, Indoor objects	LiDAR voxelized frustum (each frustum processed by the PointNet), RGB image (using a pre-trained detector).	R-CNN	Pre-trained RGB image detector	After RP	Using RP from RGB image detector to build LiDAR frustums	Late	KITTI, SUN-RGBD
Dou et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car	LiDAR voxel (processed by VoxelNet), RGB image (processed by a FCN to get semantic features)	Two stage detector	Predictions with fused features	Before RP	Feature concatenation	Middle	KITTI
Sindagi et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car	LiDAR voxel (processed by VoxelNet), RGB image (processed by a pre-trained 2D image detector).	One stage detector	Predictions with fused features	Before RP	Feature concatenation	Early, Middle	KITTI
Liang et al., 2018 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist	LiDAR BEV maps, RGB image. Each processed by ResNet	One stage detector	Predictions with fused features.	Before RP	Addition, continuous fusion layer	Middle	KITTI, self-recorded
Du et al., 2018 [pdf][ref]	LiDAR, vision camera	3D Car	LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet)	R-CNN	Pre-trained RGB image detector produces 2D bounding boxes to crop LiDAR points, which are then clustered	Before and at RP	Ensemble: use RGB image detector to regress car dimensions for a model fitting algorithm.	Late	KITTI, self-recorded data
Yang et al., 2018 [pdf][ref]	LiDAR, HD-map	3D Car	LiDAR BEV maps, Road mask image from HD map. Inputs processed by PIXOR++ [ref] with the backbone similar to FPN	One stage detector	Detector predictions	Before RP	Feature concatenation	Early	KITTI, TOR4D Dataset~[ref]
Casas et al., 2018 [pdf][ref]	LiDAR, HD-map	3D Car	sequential LiDAR BEV maps, sequential several road topology mask images from HD map. Each input processed by a base network with residual blocks	One stage detector	Detector predictions	Before RP	Feature concatenation	Middle	self-recorded data
Shin et al., 2018 [pdf][ref]	LiDAR, vision camera	3D Car	LiDAR point clouds, (processed by PointNet [ref]); RGB image (processed by a 2D CNN)	R-CNN	A 3D object detector for RGB image	After RP	Using RP from RGB image detector to search LiDAR point clouds	Late	KITTI
Chen et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car	LiDAR BEV and spherical maps, RGB image. Each processed by a base network built on VGG16	Faster-RCNN	A RPN from LiDAR BEV map	After RP	average mean, deep fusion	Early, Middle, Late	KITTI
Wang et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian	LiDAR BEV map, RGB image. Each processed by a RetinaNet [ref]	One stage detector	Fused LiDAR and RGB image features extracted from CNN	Before RP	Sparse mean manipulation	Middle	KITTI
Ku et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist	LiDAR BEV map, RGB image. Each processed by VGG16	Faster-RCNN	Fused LiDAR and RGB image features extracted from CNN	Before and after RP	Average mean	Early, Middle, Late	KITTI
Xu et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist, Indoor objects	LiDAR points (processed by PointNet), RGB image (processed by ResNet)	R-CNN	Pre-trained RGB image detector	After RP	Feature concatenation for local and global features	Middle	KITTI, SUN-RGBD
Qi et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist, Indoor objects	LiDAR points (processed by PointNet), RGB image (using a pre-trained detector)	R-CNN	Pre-trained RGB image detector	After RP	Feature concatenation	Middle, Late	KITTI, SUN-RGBD