In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range multi-view images with ground-truth geometry, material, and spatially-varying lighting. To improve the quality of scene-level inverse rendering, a novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced. MAIR performs scene-level multi-view inverse rendering by expanding the OpenRooms dataset, designing efficient pipelines to handle multi-view images, and splitting spatially-varying lighting. Although MAIR showed impressive results, its lighting representation is fixed to spherical Gaussians, which limits its ability to render images realistically. Consequently, MAIR cannot be directly used in applications such as material editing. Moreover, its multi-view aggregation networks have difficulties extracting rich features because they only focus on the mean and variance between multi-view features. In this paper, we propose its extended version, called MAIR++. MAIR++ addresses the aforementioned limitations by introducing an implicit lighting representation that accurately captures the lighting conditions of an image while facilitating realistic rendering. Furthermore, we design a directional attention-based multi-view aggregation network to infer more intricate relationships between views. Experimental results show that MAIR++ not only outperforms MAIR and single-view-based methods but also demonstrates robust performance on unseen real-world scenes.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2025.3548679 | DOI Listing |
Sci Rep
March 2025
Viterbi Faculty of Electrical & Computer Engineering, Technion - Israel Institute of Technology, Technion City, 3200003, Haifa, Israel.
The prediction of climate has been a long-standing problem in contemporary science. One of the reasons stems from a gap in the ability to obtain 3D mapping of clouds, especially shallow scattered clouds. These clouds are strongly affected by mixing processes with their surroundings, rendering their internal volumetric structure highly heterogeneous.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
March 2025
In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range multi-view images with ground-truth geometry, material, and spatially-varying lighting. To improve the quality of scene-level inverse rendering, a novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced.
View Article and Find Full Text PDFNeural Netw
February 2025
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering at Beihang University, Beijing, China; Qingdao Research Institute of Beihang University, Qingdao, China. Electronic address:
Neural Radiance Fields (NeRF) have shown great potential for synthesizing novel views. Currently, despite the existence of some initial controllable and editable NeRF methods, they remain limited in terms of efficient and fine-grained editing capabilities, hinders the creative editing abilities and potential applications for NeRF. In this paper, we present the rotation-invariant neural point fields with interactive segmentation for fine-grained and efficient editing.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
March 2025
Reconstructing objects from posed images is a crucial and complex task in computer graphics and computer vision. While NeRF-based neural reconstruction methods have exhibited impressive reconstruction ability, they tend to be time-comsuming. Recent strategies have adopted 3D Gaussian Splatting (3D-GS) for inverse rendering, which have led to quick and effective outcomes.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
February 2025
Learning object-centric representations from unsupervised videos is challenging. Unlike most previous approaches that focus on decomposing 2D images, we present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning within a differentiable volume rendering framework. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene, which infers per-object occupancy probabilities at individual spatial locations.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!