Detection LiDAR

Back to index Back to Detection

Reference	Sensors	Object Type	Sensing Modality Representations and Processing	Network Pipeline	How to generate Region Proposals (RP)	When to fuse	Fusion Operation and Method	Fusion Level	Dataset(s) used
Liang et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car, Pedestrian, Cyclist	LiDAR BEV maps, RGB image. Each processed by a ResNet with auxiliary tasks: depth estimation and ground segmentation	Faster R-CNN	Predictions with fused features	Before RP	Addition, continuous fusion layer	Middle	KITTI, self-recorded
Wang et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car, Pedestrian, Cyclist, Indoor objects	LiDAR voxelized frustum (each frustum processed by the PointNet), RGB image (using a pre-trained detector).	R-CNN	Pre-trained RGB image detector	After RP	Using RP from RGB image detector to build LiDAR frustums	Late	KITTI, SUN-RGBD
Dou et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car	LiDAR voxel (processed by VoxelNet), RGB image (processed by a FCN to get semantic features)	Two stage detector	Predictions with fused features	Before RP	Feature concatenation	Middle	KITTI
Sindagi et al., 2019 [pdf][ref]	LiDAR, visual camera	3D Car	LiDAR voxel (processed by VoxelNet), RGB image (processed by a pre-trained 2D image detector).	One stage detector	Predictions with fused features	Before RP	Feature concatenation	Early, Middle	KITTI
Bijelic et al., 2019 [pdf][ref]	LiDAR, visual camera	2D Car in foggy weather	Lidar front view images (depth, intensity, height), RGB image. Each processed by VGG16	SSD	Predictions with fused features	Before RP	Feature concatenation	From early to middle layers	Self-recorded datasets focused on foggy weather, simulated foggy images from KITTI
Pfeuffer et al., 2018 [pdf][ref]	LiDAR, vision camera	Multiple 2D objects	LiDAR spherical, and front-view sparse depth, dense depth image, RGB image. Each processed by VGG16	Faster-RCNN	RPN from fused features	Before RP	Feature concatenation	Early, Middle, Late	KITTI
Liang et al., 2018 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist	LiDAR BEV maps, RGB image. Each processed by ResNet	One stage detector	Predictions with fused features.	Before RP	Addition, continuous fusion layer	Middle	KITTI, self-recorded
Du et al., 2018 [pdf][ref]	LiDAR, vision camera	3D Car	LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet)	R-CNN	Pre-trained RGB image detector produces 2D bounding boxes to crop LiDAR points, which are then clustered	Before and at RP	Ensemble: use RGB image detector to regress car dimensions for a model fitting algorithm.	Late	KITTI, self-recorded data
Kim et al., 2018 [pdf][ref]	LiDAR, vision camera	2D Car	LiDAR front-view depth image, RGB image. Each input processed by VGG16	SSD	SSD with fused features	Before RP	Feature concatenation, Mixture of Experts	Middle	KITTI
Yang et al., 2018 [pdf][ref]	LiDAR, HD-map	3D Car	LiDAR BEV maps, Road mask image from HD map. Inputs processed by PIXOR++ [ref] with the backbone similar to FPN	One stage detector	Detector predictions	Before RP	Feature concatenation	Early	KITTI, TOR4D Dataset~[ref]
Casas et al., 2018 [pdf][ref]	LiDAR, HD-map	3D Car	sequential LiDAR BEV maps, sequential several road topology mask images from HD map. Each input processed by a base network with residual blocks	One stage detector	Detector predictions	Before RP	Feature concatenation	Middle	self-recorded data
Shin et al., 2018 [pdf][ref]	LiDAR, vision camera	3D Car	LiDAR point clouds, (processed by PointNet [ref]); RGB image (processed by a 2D CNN)	R-CNN	A 3D object detector for RGB image	After RP	Using RP from RGB image detector to search LiDAR point clouds	Late	KITTI
Chen et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car	LiDAR BEV and spherical maps, RGB image. Each processed by a base network built on VGG16	Faster-RCNN	A RPN from LiDAR BEV map	After RP	average mean, deep fusion	Early, Middle, Late	KITTI
Asvadi et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Car	LiDAR front-view dense-depth (DM) and reflectance maps (RM), RGB image. Each processed through a YOLO net	YOLO	YOLO outputs for LiDAR DM and RM maps, and RGB image	After RP	Ensemble: feed engineered features from ensembled bounding boxes to a network to predict scores for NMS	Late	KITTI
Oh et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Car, Pedestrian, Cyclist	LiDAR front-view dense-depth map (for fusion: processed by VGG16), LiDAR voxel (for ROIs: segmentation and region growing), RGB image (for fusion: processed by VGG16; for ROIs: segmentation and grouping)	R-CNN	LiDAR voxel and RGB image separately	After RP	Association matrix using basic belief assignment	Late	KITTI
Wang et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian	LiDAR BEV map, RGB image. Each processed by a RetinaNet [ref]	One stage detector	Fused LiDAR and RGB image features extracted from CNN	Before RP	Sparse mean manipulation	Middle	KITTI
Ku et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist	LiDAR BEV map, RGB image. Each processed by VGG16	Faster-RCNN	Fused LiDAR and RGB image features extracted from CNN	Before and after RP	Average mean	Early, Middle, Late	KITTI
Xu et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist, Indoor objects	LiDAR points (processed by PointNet), RGB image (processed by ResNet)	R-CNN	Pre-trained RGB image detector	After RP	Feature concatenation for local and global features	Middle	KITTI, SUN-RGBD
Qi et al., 2017 [pdf][ref]	LiDAR, vision camera	3D Car, Pedestrian, Cyclist, Indoor objects	LiDAR points (processed by PointNet), RGB image (using a pre-trained detector)	R-CNN	Pre-trained RGB image detector	After RP	Feature concatenation	Middle, Late	KITTI, SUN-RGBD
Du et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Car	LiDAR voxel (processed by RANSAC and model fitting), RGB image (processed by VGG16 and GoogLeNet)	Faster-RCNN	First clustered by LiDAR point clouds, then fine-tuned by a RPN of RGB image	Before RP	Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction	Late	KITTI
Matti et al., 2017 [pdf][ref]	LiDAR, vision camera	2D Pedestrian	LiDAR points (clustering with DBSCAN) and RGB image (processed by ResNet)	R-CNN	Clustered by LiDAR point clouds, then size and ratio corrected on RGB image.	Before and at RP	Ensemble: feed LiDAR RP to RGB image-based CNN for final prediction	Late	KITTI
Schlosser et al., 2016 [pdf][ref]	LiDAR, vision camera	2D Pedestrian	LiDAR HHA image, RGB image. Each processed by a small ConvNet	R-CNN	Deformable Parts Model with RGB image	After RP	Feature concatenation	Early, Middle, Late	KITTI
Kim et al., 2016 [pdf][ref]	LiDAR, vision camera	2D Pedestrian, Cyclist	LiDAR front-view depth image, RGB image. Each processed by Fast-RCNN network [ref]	Fast-RCNN	Selective search for LiDAR and RGB image separately.	At RP	Ensemble: joint RP are fed to RGB image based CNN.	Late	KITTI

Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Di Feng, Christian Haase-Schuetz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck and Klaus Dietmayer
Robert Bosch GmbH in cooperation with Ulm University and Karlruhe Institute of Technology
* Contributed equally

Detection LiDAR