Segmentation

Back to index

Reference	Sensors	Semantics	Sensing Modality Representations	Fusion Operation and Method	Fusion Level	Dataset(s) used
Chen et al., 2019 [pdf][ref]	LiDAR, visual camera	Road segmentation	RGB image, altitude difference image. Each processed by a CNN	Feature adaptation module, modified concatenation.	Middle	KITTI
Valada et al., 2019 [pdf][ref]	Visual camera, depth camera, thermal camera	Multiple 2D objects	RGB image, thermal image, depth image. Each processed by FCN with ResNet backbone (Adapnet++ architecture)	Extension of Mixture of Experts	Middle	Six datasets, including Cityscape, Sun RGB-D, etc.
Sun et al., 2019 [pdf][ref]	Visual camera, thermal camera	Multiple 2D objects in campus environments	RGB image, thermal image. Each processed by a base network built on ResNet	Element-wise summation in the encoder networks	Middle	Datasets published by [ref]
Caltagirone et al., 2019 [pdf][ref]	LiDAR, vision camera	Road segmentation	LiDAR front-view depth images, RGB image. Each input processed by a FCN	Feature concatenation (For early and late fusion), weighted addition similar to gating network (for middle-level cross fusion)	Early, Middle, Late	KITTI
Erkent et al., 2018 [pdf][ref]	LiDAR, visual camera	Multiple 2D objects	LiDAR BEV occupancy grids (processed based on Bayesian filtering and tracking), RGB image (processed by a FCN with VGG16 backbone)	Feature concatenation	Middle	KITTI, self-recorded
Lv et al., 2018 [pdf][ref]	LiDAR, vision camera	Road segmentation	LiDAR BEV maps, RGB image. Each input processed by a FCN with dilated convolution operator. RGB image features are alo projected onto LiDAR BEV plane before fusion	Feature concatenation	Middle	KITTI
Wulff et al., 2018 [pdf][ref]	LiDAR, vision camera	Road segmentation. Alternatives: freespace, ego-lane detection	LiDAR BEV maps, RGB image projected onto BEV plane. Inputs processed by a FCN with UNet	Feature concatenation	Early	KITTI
Kim et al., 2018 [pdf][ref]	LiDAR, vision camera	2D Off-road terrains	LiDAR voxel (processed by 3D convolution), RGB image (processed by ENet)	Addition	Early, Middle, Late	self-recorded
Guan et al., 2018 [pdf][ref]	Vision camera, thermal camera	2D Pedestrian	RGB image, thermal image. Each processed by a base network built on VGG16	Feature concatenation, Mixture of Experts	Early, Middle, Late	KAIST Pedestrian Dataset
Yang et al., 2018 [pdf][ref]	LiDAR, vision camera	Road segmentation	LiDAR points (processed by PointNet++), RGB image (processed by FCN with VGG16 backbone)	Optimizing Conditional Random Field (CRF)	Late	KITTI
Gu et al., 2018 [pdf][ref]	LiDAR, visual camera	Road segmentation	LiDAR front-view depth and height maps (processed by a inverse-depth histogram based line scanning strategy), RGB image (processed by a FCN).	Optimizing Conditional Random Field	Late	KITTI
Cai et al., 2018 [pdf][ref]	Satellite map with route information, visual camera	Road segmentation	Route map image, RGB image. Images are fused and processed by a FCN	Overlaying the line and curve segments in the route map onto the RGB image to generate the Map Fusion Image (MFI)	Early	self-recorded data
Ha et al., 2017 [pdf][ref]	Vision camera, thermal camera	Multiple 2D objects in campus environments	RGB image, thermal image. Each processed by a FCN and mini-inception block	Feature concatenation, addition (``short-cut fusion'')	Middle	self-recorded data
Valada et al., 2017 [pdf][ref]	Vision camera, thermal camera	Multiple 2D objects	RGB image, thermal image, depth image. Each processed by FCN with ResNet backbone	Mixture of Experts	Late	Cityscape, Freiburg Multispectral Dataset, Synthia
Schneider et al., 2017 [pdf][ref]	Vision camera	Multiple 2D Objects	RGB image, depth image	Feature concatenation	Early, Middle, Late	Cityscape
Schneider et al., 2017 [pdf][ref]	Vision camera	Multiple 2D Objects	RGB image (processed by GoogLeNet), depth image from stereo camera (processed by NiN net)	Feature concatenation	Early, Middle, Late	Cityscape
Valada et al., 2016 [pdf][ref]	Vision camera, thermal camera	Multiple 2D objects in forested environments	RGB image, thermal image, depth image. Each processed by the UpNet (built on VGG16 and up-convolution)	Feature concatenation, addition	Early, Late	self-recorded data