3-D lane detection is a challenging task due to the diversity of lanes, occlusion, dazzle light, and so on. Traditional methods usually use highly specialized handcrafted features and carefully designed postprocessing to detect them. However, these methods are based on strong assumptions and single modal so that they are easily scalable and have poor performance. In this article, a multimodal fusion network (MFNet) is proposed through using multihead nonlocal attention and feature pyramid for 3-D lane detection. It includes three parts: multihead deformable transformation (MDT) module, multidirectional attention feature pyramid fusion (MA-FPF) module, and top-view lane prediction (TLP) ones. First, MDT is presented to learn and mine multimodal features from RGB images, depth maps, and point cloud data (PCD) for achieving optimal lane feature extraction. Then, MA-FPF is designed to fuse multiscale features for presenting the vanish of lane features as the network deepens. Finally, TLP is developed to estimate 3-D lanes and predict their position. Experimental results on the 3-D lane synthetic and ONCE-3DLanes datasets demonstrate that the performance of the proposed MFNet outperforms the state-of-the-art methods in both qualitative and quantitative analyses and visual comparisons.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3398654DOI Listing

Publication Analysis

Top Keywords

3-d lane
16
lane detection
12
multimodal fusion
8
fusion network
8
attention feature
8
feature pyramid
8
lane
7
3-d
5
network 3-d
4
detection 3-d
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!