Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a γ map (the ratio of height to depth) for 3D reconstruction. The γ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by warping the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2023.3289323DOI Listing

Publication Analysis

Top Keywords

planar parallax
24
road plane
12
road planar
8
consecutive frames
8
planar
6
parallax
6
monocular road
4
parallax estimation
4
estimation estimating
4
estimating structure
4

Similar Publications

Pictorial depth cues elicit the perception of tridimensionality in dogs.

Anim Cogn

July 2024

Department of Comparative Biomedicine and Food Science, Università degli Studi di Padova, Viale dell'Università 16, Legnaro, PD, 35020, Italy.

The perception of tridimensionality is elicited by binocular disparity, motion parallax, and monocular or pictorial cues. The perception of tridimensionality arising from pictorial cues has been investigated in several non-human animal species. Although dogs can use and discriminate bidimensional images, to date there is no evidence of dogs' ability to perceive tridimensionality in pictures and/or through pictorial cues.

View Article and Find Full Text PDF

Current efforts with light field displays are mainly concentrated on the widest possible viewing angle, while a single viewer only needs to view the display in a specific viewing direction. To make the light field display a practical practice, a super multi-view light field display is proposed to compress the information in the viewing zone of a single user by reducing the redundant viewpoints. A quasi-directional backlight is proposed, and a lenticular lens array is applied to achieve the restricted viewing zone.

View Article and Find Full Text PDF

Joint Calibration of a Multimodal Sensor System for Autonomous Vehicles.

Sensors (Basel)

June 2023

Faculty of Electrical Engineering, University of Ljubljana, Tržaška Cesta 25, SI-1000 Ljubljana, Slovenia.

Multimodal sensor systems require precise calibration if they are to be used in the field. Due to the difficulty of obtaining the corresponding features from different modalities, the calibration of such systems is an open problem. We present a systematic approach for calibrating a set of cameras with different modalities (RGB, thermal, polarization, and dual-spectrum near infrared) with regard to a LiDAR sensor using a planar calibration target.

View Article and Find Full Text PDF

Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene.

View Article and Find Full Text PDF

The A-Effect and Global Motion.

Vision (Basel)

March 2019

Centre for Vision Research, York University, Toronto, ON M3J 1P3, Canada.

When the head is tilted, an objectively vertical line viewed in isolation is typically perceived as tilted. We explored whether this shift also occurs when viewing global motion displays perceived as either object-motion or self-motion. Observers stood and lay left side down while viewing (1) a static line, (2) a random-dot display of 2-D (planar) motion or (3) a random-dot display of 3-D (volumetric) global motion.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!