Feature Map Transformation for Fusion of Multi-Sensor Object Detection Networks for Autonomous Driving

Abstract

We present a general framework for fusing pre-trained multisensor object detection networks for perception in autonomous cars at an intermediate stage using perspective invariant features. Key innovation is an autoencoder-inspired Transformer module which transforms perspective as well as feature activation layout from one sensor modality to another. Transformed feature maps can be combined with those of a modality-native object detector to enhance performance and reliability through a simple fusion scheme. Our approach is not limited to a specific object detection network architecture or even to specific sensor modalities. We show effectiveness of the proposed scheme through experiments on our own as well as on the KITTI dataset.

Publication
In Proceedings of Computer Vision Conference (CVC), Las Vegas (Nevada)