Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)-segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D-3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained σ3=0.95, which is an improvement of 0.14 points (compared with the state of the art of σ3=0.81) by using manual segmentation, and σ3=0.89 using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8875167 | PMC |
http://dx.doi.org/10.3390/s22041669 | DOI Listing |
Sensors (Basel)
December 2024
College of Railway Transportation, Hunan University of Technology, Zhuzhou 412007, China.
Rail corrugation intensifies wheel-rail vibrations, often leading to damage in vehicle-track system components within affected sections. This paper proposes a novel method for identifying rail corrugation, which combines Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), permutation entropy (PE), and Smoothed Pseudo Wigner-Ville Distribution (SPWVD). Initially, vertical acceleration data from the axle box are decomposed using CEEMDAN to extract intrinsic mode functions (IMFs) with distinct frequencies.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada.
Autonomous technologies have revolutionized transportation, military operations, and space exploration, necessitating precise localization in environments where traditional GPS-based systems are unreliable or unavailable. While widespread for outdoor localization, GPS systems face limitations in obstructed environments such as dense urban areas, forests, and indoor spaces. Moreover, GPS reliance introduces vulnerabilities to signal disruptions, which can lead to significant operational failures.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Department of Electronic & Computer Engineer, University of Limerick, V94 T9PX Limerick, Ireland.
Current deep learning-based phase unwrapping techniques for iToF Lidar sensors focus mainly on static indoor scenarios, ignoring motion blur in dynamic outdoor scenarios. Our paper proposes a two-stage semi-supervised method to unwrap ambiguous depth maps affected by motion blur in dynamic outdoor scenes. The method trains on static datasets to learn unwrapped depth map prediction and then adapts to dynamic datasets using continuous learning methods.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Department of Electrical Engineering, Center for Innovative Research on Aging Society (CIRAS), Advanced Institute of Manufacturing with High-Tech Innovations (AIM-HI), National Chung Cheng University, Chia-Yi 621, Taiwan.
In computer vision, accurately estimating a 3D human skeleton from a single RGB image remains a challenging task. Inspired by the advantages of multi-view approaches, we propose a method of predicting enhanced 2D skeletons (specifically, predicting the joints' relative depths) from multiple virtual viewpoints based on a single real-view image. By fusing these virtual-viewpoint skeletons, we can then estimate the final 3D human skeleton more accurately.
View Article and Find Full Text PDFSensors (Basel)
December 2024
College of Engineering, Huaqiao University, Quanzhou 362021, China.
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model's ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!