Improving Depth Estimation by Embedding Semantic Segmentation: A Hybrid CNN Model.

Sensors (Basel)

Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz s/n, Ciudad de México 07738, Mexico.

Published: February 2022

Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)-segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D-3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained σ3=0.95, which is an improvement of 0.14 points (compared with the state of the art of σ3=0.81) by using manual segmentation, and σ3=0.89 using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8875167PMC
http://dx.doi.org/10.3390/s22041669DOI Listing

Publication Analysis

Top Keywords

depth estimation
16
semantic segmentation
16
segmentation
5
improving depth
4
estimation
4
estimation embedding
4
semantic
4
embedding semantic
4
segmentation hybrid
4
hybrid cnn
4

Similar Publications

Rail corrugation intensifies wheel-rail vibrations, often leading to damage in vehicle-track system components within affected sections. This paper proposes a novel method for identifying rail corrugation, which combines Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), permutation entropy (PE), and Smoothed Pseudo Wigner-Ville Distribution (SPWVD). Initially, vertical acceleration data from the axle box are decomposed using CEEMDAN to extract intrinsic mode functions (IMFs) with distinct frequencies.

View Article and Find Full Text PDF

Autonomous technologies have revolutionized transportation, military operations, and space exploration, necessitating precise localization in environments where traditional GPS-based systems are unreliable or unavailable. While widespread for outdoor localization, GPS systems face limitations in obstructed environments such as dense urban areas, forests, and indoor spaces. Moreover, GPS reliance introduces vulnerabilities to signal disruptions, which can lead to significant operational failures.

View Article and Find Full Text PDF

Current deep learning-based phase unwrapping techniques for iToF Lidar sensors focus mainly on static indoor scenarios, ignoring motion blur in dynamic outdoor scenarios. Our paper proposes a two-stage semi-supervised method to unwrap ambiguous depth maps affected by motion blur in dynamic outdoor scenes. The method trains on static datasets to learn unwrapped depth map prediction and then adapts to dynamic datasets using continuous learning methods.

View Article and Find Full Text PDF

Estimating a 3D Human Skeleton from a Single RGB Image by Fusing Predicted Depths from Multiple Virtual Viewpoints.

Sensors (Basel)

December 2024

Department of Electrical Engineering, Center for Innovative Research on Aging Society (CIRAS), Advanced Institute of Manufacturing with High-Tech Innovations (AIM-HI), National Chung Cheng University, Chia-Yi 621, Taiwan.

In computer vision, accurately estimating a 3D human skeleton from a single RGB image remains a challenging task. Inspired by the advantages of multi-view approaches, we propose a method of predicting enhanced 2D skeletons (specifically, predicting the joints' relative depths) from multiple virtual viewpoints based on a single real-view image. By fusing these virtual-viewpoint skeletons, we can then estimate the final 3D human skeleton more accurately.

View Article and Find Full Text PDF

Cascaded Feature Fusion Grasping Network for Real-Time Robotic Systems.

Sensors (Basel)

December 2024

College of Engineering, Huaqiao University, Quanzhou 362021, China.

Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model's ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!