GeometryFormer: Semi-Convolutional Transformer Integrated with Geometric Perception for Depth Completion in Autonomous Driving Scenes.

Sensors (Basel)

National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130025, China.

Published: December 2024

AI Article Synopsis

Article Abstract

Depth completion is widely employed in Simultaneous Localization and Mapping (SLAM) and Structure from Motion (SfM), which are of great significance to the development of autonomous driving. Recently, the methods based on the fusion of vision transformer (ViT) and convolution have brought the accuracy to a new level. However, there are still two shortcomings that need to be solved. On the one hand, for the poor performance of ViT in details, this paper proposes a semi-convolutional vision transformer to optimize local continuity and designs a geometric perception module to learn the positional correlation and geometric features of sparse points in three-dimensional space to perceive the geometric structures in depth maps for optimizing the recovery of edges and transparent areas. On the other hand, previous methods implement single-stage fusion to directly concatenate or add the outputs of ViT and convolution, resulting in incomplete fusion of the two, especially in complex outdoor scenes, which will generate lots of outliers and ripples. This paper proposes a novel double-stage fusion strategy, applying learnable confidence after self-attention to flexibly learn the weight of local features. Our network achieves state-of-the-art (SoTA) performance with the NYU-Depth-v2 Dataset and the KITTI Depth Completion Dataset. It is worth mentioning that the root mean square error (RMSE) of our model on the NYU-Depth-v2 Dataset is 87.9 mm, which is currently the best among all algorithms. At the end of the article, we also verified the generalization ability in real road scenes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11679245PMC
http://dx.doi.org/10.3390/s24248066DOI Listing

Publication Analysis

Top Keywords

depth completion
12
geometric perception
8
autonomous driving
8
vision transformer
8
vit convolution
8
paper proposes
8
nyu-depth-v2 dataset
8
geometryformer semi-convolutional
4
semi-convolutional transformer
4
transformer integrated
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!