This paper proposes a new method for simultaneous 3D reconstruction and semantic segmentation for indoor scenes. Unlike existing methods that require recording a video using a color camera and/or a depth camera, our method only needs a small number of (e.g., 3~5) color images from uncalibrated sparse views, which significantly simplifies data acquisition and broadens applicable scenarios. To achieve promising 3D reconstruction from sparse views with limited overlap, our method first recovers the depth map and semantic information for each view, and then fuses the depth maps into a 3D scene. To this end, we design an iterative deep architecture, named IterNet, to estimate the depth map and semantic segmentation alternately. To obtain accurate alignment between views with limited overlap, we further propose a joint global and local registration method to reconstruct a 3D scene with semantic information. We also make available a new indoor synthetic dataset, containing photorealistic high-resolution RGB images, accurate depth maps and pixel-level semantic labels for thousands of complex layouts. Experimental results on public datasets and our dataset demonstrate that our method achieves more accurate depth estimation, smaller semantic segmentation errors, and better 3D reconstruction results over state-of-the-art methods.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2020.2986712DOI Listing

Publication Analysis

Top Keywords

sparse views
12
semantic segmentation
12
indoor scenes
8
views limited
8
limited overlap
8
depth map
8
map semantic
8
depth maps
8
accurate depth
8
semantic
6

Similar Publications

DECT sparse reconstruction based on hybrid spectrum data generative diffusion model.

Comput Methods Programs Biomed

January 2025

Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing, China; School of Computer Science and Engineering, Southeast University, Nanjing, China.

Purpose: Dual-energy computed tomography (DECT) enables the differentiation of different materials. Additionally, DECT images consist of multiple scans of the same sample, revealing information similarity within the energy domain. To leverage this information similarity and address safety concerns related to excessive radiation exposure in DECT imaging, sparse view DECT imaging is proposed as a solution.

View Article and Find Full Text PDF

A novel framework for phage-host prediction via logical probability theory and network sparsification.

Brief Bioinform

November 2024

Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.

Bacterial resistance has emerged as one of the greatest threats to human health, and phages have shown tremendous potential in addressing the issue of drug-resistant bacteria by lysing host. The identification of phage-host interactions (PHI) is crucial for addressing bacterial infections. Some existing computational methods for predicting PHI are suboptimal in terms of prediction efficiency due to the limited types of available information.

View Article and Find Full Text PDF

TD-STrans: Tri-domain sparse-view CT reconstruction based on sparse transformer.

Comput Methods Programs Biomed

December 2024

Department of Information and Communication Engineering, North University of China, Taiyuan 030051, China; The State Key Lab for Electronic Testing Technology, North University of China, Taiyuan 030051, China. Electronic address:

Background And Objective: Sparse-view computed tomography (CT) speeds up scanning and reduces radiation exposure in medical diagnosis. However, when the projection views are severely under-sampled, deep learning-based reconstruction methods often suffer from over-smoothing of the reconstructed images due to the lack of high-frequency information. To address this issue, we introduce frequency domain information into the popular projection-image domain reconstruction, proposing a Tri-Domain sparse-view CT reconstruction model based on Sparse Transformer (TD-STrans).

View Article and Find Full Text PDF

Soft x-ray tomography using L1 regularization for MHD modes with limited sight lines in JT-60SA.

Rev Sci Instrum

December 2024

National Institute for Fusion Science, National Institutes of Natural Sciences, 322-6 Oroshi-cho, Toki 509-5292, Japan.

Soft x-ray (SX) tomography is a useful diagnostic in fusion research, and a multi-channel SX diagnostic will be installed in JT-60SA, the largest elongated tokamak in the world. However, in the SX diagnostic of JT-60SA, plasmas will be only viewed from the low field side and the upper side of plasmas; the sight lines are limited, which would be common in future devices as well as JT-60SA. This kind of limited sight lines is not preferred for SX tomography to investigate the spatial structure of magnetohydrodynamics (MHD) modes because inadequate information of plasmas makes artifacts in the reconstructed SX profiles.

View Article and Find Full Text PDF

Graph Neural Networks (GNNs) have achieved great success in semi-supervised learning. Existing GNNs typically aggregate the features via message passing with the aid of rich labels. However, real-world graphs have limited labels, and overfitting weakens the classification ability when labels are insufficient.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!