We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses. While many previous works learn to hallucinate the shape directly from priors, we adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network. Instead of building a direct mapping function from images to 3D shape, our model learns to predict series of deformations to improve a coarse shape iteratively. Inspired by traditional multiple view geometry methods, our network samples nearby area around the initial mesh's vertex locations and reasons an optimal deformation using perceptual feature statistics built from multiple input images. Extensive experiments show that our model produces accurate 3D shapes that are not only visually plausible from the input perspectives, but also well aligned to arbitrary viewpoints. With the help of physically driven architecture, our model also exhibits generalization capability across different semantic categories, and the number of input images. Model analysis experiments show that our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable renderer for test-time optimization.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2022.3169735DOI Listing

Publication Analysis

Top Keywords

input images
8
experiments model
8
images
5
shape
5
model
5
pixel2mesh++ mesh
4
mesh generation
4
generation refinement
4
refinement multi-view
4
multi-view images
4

Similar Publications

Background: Food image recognition, a crucial step in computational gastronomy, has diverse applications across nutritional platforms. Convolutional neural networks (CNNs) are widely used for this task due to their ability to capture hierarchical features. However, they struggle with long-range dependencies and global feature extraction, which are vital in distinguishing visually similar foods or images where the context of the whole dish is crucial, thus necessitating transformer architecture.

View Article and Find Full Text PDF

Due to the complex and uncertain physics of lightning strike on carbon fiber-reinforced polymer (CFRP) laminates, conventional numerical simulation methods for assessing the residual strength of lightning-damaged CFRP laminates are highly time-consuming and far from pretty. To overcome these challenges, this study proposes a new prediction method for the residual strength of CFRP laminates based on machine learning. A diverse dataset is acquired and augmented from photographs of lightning strike damage areas, C-scan images, mechanical performance data, layup details, and lightning current parameters.

View Article and Find Full Text PDF

Recognizing targets in infra-red images is an important problem for defense and security applications. A deployed network must not only recognize the known classes, but it must also reject any new or objects without confusing them to be one of the known classes. Our goal is to enhance the ability of existing (or pretrained) classifiers to detect and reject unknown classes.

View Article and Find Full Text PDF

Assessing vines' vigour is essential for vineyard management and automatization of viticulture machines, including shaking adjustments of berry harvesters during grape harvest or leaf pruning applications. To address these problems, based on a standardized growth class assessment, labeled ground truth data of precisely located grapevines were predicted with specifically selected Machine Learning (ML) classifiers (Random Forest Classifier (RFC), Support Vector Machines (SVM)), utilizing multispectral UAV (Unmanned Aerial Vehicle) sensor data. The input features for ML model training comprise spectral, structural, and texture feature types generated from multispectral orthomosaics (spectral features), Digital Terrain and Surface Models (DTM/DSM- structural features), and Gray-Level Co-occurrence Matrix (GLCM) calculations (texture features).

View Article and Find Full Text PDF

Efficient Multi-Task Training with Adaptive Feature Alignment for Universal Image Segmentation.

Sensors (Basel)

January 2025

Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.

Universal image segmentation aims to handle all segmentation tasks within a single model architecture and ideally requires only one training phase. To achieve task-conditioned joint training, a task token needs to be used in the multi-task training to condition the model for specific tasks. Existing approaches generate the task token from a text input (e.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!