Single-image 3-D reconstruction has long been a challenging problem. Recent deep learning approaches have been introduced to this 3-D area, but the ability to generate point clouds still remains limited due to inefficient and expensive 3-D representations, the dependency between the output and the number of model parameters, or the lack of a suitable computing operation. In this article, we present a novel deep-learning-based method to reconstruct a point cloud of an object from a single still image. The proposed method can be decomposed into two steps: feature fusion and deformation. The first step extracts both global and point-specific shape features from a 2-D object image, and then injects them into a randomly generated point cloud. In the second step, which is deformation, we introduce a new layer termed as GraphX that considers the interrelationship between points like common graph convolutions but operates on unordered sets. The framework can be applicable to realistic image data with background as we optionally learn a mask branch to segment objects from input images. To complement the quality of point clouds, we further propose an objective function to control the point uniformity. In addition, we introduce different variants of GraphX that cover from best performance to best memory budget. Moreover, the proposed model can generate an arbitrary-sized point cloud, which is the first deep method to do so. Extensive experiments demonstrate that we outperform the existing models and set a new height for different performance metrics in single-image 3-D reconstruction.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2022.3211929DOI Listing

Publication Analysis

Top Keywords

point cloud
16
single-image 3-d
12
3-d reconstruction
12
point clouds
8
point
7
reconstruction rethinking
4
rethinking point
4
cloud
4
cloud deformation
4
deformation single-image
4

Similar Publications

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection.

Sensors (Basel)

January 2025

The 54th Research Institute, China Electronics Technology Group Corporation, College of Signal and Information Processing, Shijiazhuang 050081, China.

The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-world environments, research into open-vocabulary 3D object detection is essential.

View Article and Find Full Text PDF

Segment Any Leaf 3D: A Zero-Shot 3D Leaf Instance Segmentation Method Based on Multi-View Images.

Sensors (Basel)

January 2025

School of Electronic and Communication Engineering, Sun Yat-sen University, Shenzhen 518000, China.

Exploring the relationships between plant phenotypes and genetic information requires advanced phenotypic analysis techniques for precise characterization. However, the diversity and variability of plant morphology challenge existing methods, which often fail to generalize across species and require extensive annotated data, especially for 3D datasets. This paper proposes a zero-shot 3D leaf instance segmentation method using RGB sensors.

View Article and Find Full Text PDF

Topography estimation is essential for autonomous off-road navigation. Common methods rely on point cloud data from, e.g.

View Article and Find Full Text PDF

Terrestrial laser scanners (TLS) are portable dimensional measurement instruments used to obtain 3D point clouds of objects in a scene. While TLSs do not require the use of cooperative targets, they are sometimes placed in a scene to fuse or compare data from different instruments or data from the same instrument but from different positions. A contrast target is an example of such a target; it consists of alternating black/white squares that can be printed using a laser printer.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!