Recent advances in cross-modal 3D object detection rely heavily on anchor-based methods, and however, intractable anchor parameter tuning and computationally expensive postprocessing severely impede an embedded system application, such as autonomous driving. In this work, we develop an anchor-free architecture for efficient camera-light detection and ranging (LiDAR) 3D object detection. To highlight the effect of foreground information from different modalities, we propose a dynamic fusion module (DFM) to adaptively interact images with point features via learnable filters. In addition, the 3D distance intersection-over-union (3D-DIoU) loss is explicitly formulated as a supervision signal for 3D-oriented box regression and optimization. We integrate these components into an end-to-end multimodal 3D detector termed 3D-DFM. Comprehensive experimental results on the widely used KITTI dataset demonstrate the superiority and universality of 3D-DFM architecture, with competitive detection accuracy and real-time inference speed. To the best of our knowledge, this is the first work that incorporates an anchor-free pipeline with multimodal 3D object detection.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2022.3171553 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!