Skip to the content.

Segmentation

Back to index     2D   3D   Thermal   LiDAR

Reference Sensors Semantics Sensing Modality Representations Fusion Operation and Method Fusion Level Dataset(s) used
Chen et al., 2019 [pdf][ref] LiDAR, visual camera Road segmentation RGB image, altitude difference image. Each processed by a CNN Feature adaptation module, modified concatenation. Middle KITTI
Valada et al., 2019 [pdf][ref] Visual camera, depth camera, thermal camera Multiple 2D objects RGB image, thermal image, depth image. Each processed by FCN with ResNet backbone (Adapnet++ architecture) Extension of Mixture of Experts Middle Six datasets, including Cityscape, Sun RGB-D, etc.
Sun et al., 2019 [pdf][ref] Visual camera, thermal camera Multiple 2D objects in campus environments RGB image, thermal image. Each processed by a base network built on ResNet Element-wise summation in the encoder networks Middle Datasets published by [ref]
Caltagirone et al., 2019 [pdf][ref] LiDAR, vision camera Road segmentation LiDAR front-view depth images, RGB image. Each input processed by a FCN Feature concatenation (For early and late fusion), weighted addition similar to gating network (for middle-level cross fusion) Early, Middle, Late KITTI
Erkent et al., 2018 [pdf][ref] LiDAR, visual camera Multiple 2D objects LiDAR BEV occupancy grids (processed based on Bayesian filtering and tracking), RGB image (processed by a FCN with VGG16 backbone) Feature concatenation Middle KITTI, self-recorded
Lv et al., 2018 [pdf][ref] LiDAR, vision camera Road segmentation LiDAR BEV maps, RGB image. Each input processed by a FCN with dilated convolution operator. RGB image features are alo projected onto LiDAR BEV plane before fusion Feature concatenation Middle KITTI
Wulff et al., 2018 [pdf][ref] LiDAR, vision camera Road segmentation. Alternatives: freespace, ego-lane detection LiDAR BEV maps, RGB image projected onto BEV plane. Inputs processed by a FCN with UNet Feature concatenation Early KITTI
Kim et al., 2018 [pdf][ref] LiDAR, vision camera 2D Off-road terrains LiDAR voxel (processed by 3D convolution), RGB image (processed by ENet) Addition Early, Middle, Late self-recorded
Guan et al., 2018 [pdf][ref] Vision camera, thermal camera 2D Pedestrian RGB image, thermal image. Each processed by a base network built on VGG16 Feature concatenation, Mixture of Experts Early, Middle, Late KAIST Pedestrian Dataset
Yang et al., 2018 [pdf][ref] LiDAR, vision camera Road segmentation LiDAR points (processed by PointNet++), RGB image (processed by FCN with VGG16 backbone) Optimizing Conditional Random Field (CRF) Late KITTI
Gu et al., 2018 [pdf][ref] LiDAR, visual camera Road segmentation LiDAR front-view depth and height maps (processed by a inverse-depth histogram based line scanning strategy), RGB image (processed by a FCN). Optimizing Conditional Random Field Late KITTI
Cai et al., 2018 [pdf][ref] Satellite map with route information, visual camera Road segmentation Route map image, RGB image. Images are fused and processed by a FCN Overlaying the line and curve segments in the route map onto the RGB image to generate the Map Fusion Image (MFI) Early self-recorded data
Ha et al., 2017 [pdf][ref] Vision camera, thermal camera Multiple 2D objects in campus environments RGB image, thermal image. Each processed by a FCN and mini-inception block Feature concatenation, addition (``short-cut fusion'') Middle self-recorded data
Valada et al., 2017 [pdf][ref] Vision camera, thermal camera Multiple 2D objects RGB image, thermal image, depth image. Each processed by FCN with ResNet backbone Mixture of Experts Late Cityscape, Freiburg Multispectral Dataset, Synthia
Schneider et al., 2017 [pdf][ref] Vision camera Multiple 2D Objects RGB image, depth image Feature concatenation Early, Middle, Late Cityscape
Schneider et al., 2017 [pdf][ref] Vision camera Multiple 2D Objects RGB image (processed by GoogLeNet), depth image from stereo camera (processed by NiN net) Feature concatenation Early, Middle, Late Cityscape
Valada et al., 2016 [pdf][ref] Vision camera, thermal camera Multiple 2D objects in forested environments RGB image, thermal image, depth image. Each processed by the UpNet (built on VGG16 and up-convolution) Feature concatenation, addition Early, Late self-recorded data