AI Article Synopsis

  • Metric3D v2 is a new geometric foundation model that estimates metric depth and surface normals from single images, crucial for accurate 3D recovery.
  • The model addresses challenges in zero-shot generalization for both normal estimation and depth recovery, with innovative solutions like a camera space transformation module and a joint depth-normal optimization module.
  • Trained on over 16 million images, it outperforms existing methods in multiple benchmarks, offering improved accuracy in recovering 3D structures from diverse internet images.

Article Abstract

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical for accurate 3D recovery. Depth and normal estimation, though complementary, present distinct challenges. State-of-the-art monocular depth methods achieve zero-shot generalization through affine-invariant depths, but fail to recover real-world metric scale. Conversely, current normal estimation techniques struggle with zero-shot performance due to insufficient labeled data. We propose targeted solutions for both metric depth and normal estimation. For metric depth, we present a canonical camera space transformation module that resolves metric ambiguity across various camera models and large-scale datasets, which can be easily integrated into existing monocular models. For surface normal estimation, we introduce a joint depth-normal optimization module that leverages diverse data from metric depth, allowing normal estimators to improve beyond traditional labels. Our model, trained on over 16 million images from thousands of camera models with varied annotations, excels in zero-shot generalization to new camera settings. As shown in Fig. 1, It ranks the 1st in multiple zero-shot and standard benchmarks for metric depth and surface normal prediction. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our model also relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. Such applications highlight the versatility of Metric3D v2 models as geometric foundation models.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3444912DOI Listing

Publication Analysis

Top Keywords

metric depth
24
normal estimation
24
surface normal
16
geometric foundation
12
depth surface
12
metric
10
foundation model
8
zero-shot metric
8
depth
8
normal
8

Similar Publications

Background/objectives: For healthcare institutions developing a robotic programme, delivering value for patients, clinicians, and payers is key. However, the impact on the surgeon, training pathways, and logistics are often overlooked. We conducted a study on the impact of robotic surgery on surgeons, access to robotic surgical training, and factors associated with developing a successful robotic programme.

View Article and Find Full Text PDF

Purpose In linac-based stereotactic radiosurgery (SRS) utilizing a multileaf collimator (MLC) for brain metastases (BMs), a volumetric-modulated arc (VMA) technique is indispensable for generating a suitable dose distribution with efficient planning and delivery. However, the optimal calculation grid spacing (GS) and statistical uncertainty (SU) of the Monte Carlo algorithm for VMA optimization have yet to be determined. This planning study aimed to examine the impacts of GS and GU settings on VMA-based SRS planning and to find the optimal combination for templating.

View Article and Find Full Text PDF

Background: Trochlear dysplasia is a consistent risk factor for recurrent patellofemoral instability (PFI), but there is limited understanding of how the trochlea develops during growth. The aim of this study was to evaluate serial magnetic resonance imaging (MRI) studies performed in skeletally immature patients with and without PFI to characterize changes in trochlear anatomy over time.

Hypothesis: PFI leads to progressive worsening of trochlear dysplasia over time.

View Article and Find Full Text PDF

A proof-of-concept study for precise mapping of pigmented basal cell carcinoma in asian skin using multispectral optoacoustic tomography imaging with level set segmentation.

Eur J Nucl Med Mol Imaging

January 2025

A*STAR Skin Research Labs (A*SRL), Agency for Science, Technology and Research (A*STAR), 31 Biopolis Way, #07-01, Nanos, Singapore, 138669, Republic of Singapore.

Purpose: Basal Cell Carcinoma (BCC), the most common subtype of non-melanoma skin cancers (NMSC), is prevalent worldwide and poses significant challenges due to their increasing incidence and complex treatment considerations. Existing clinical approaches, such as Mohs micrographic surgery, are time-consuming and labour-intensive, requiring meticulous layer-by-layer excision and examination, which can significantly extend the duration of the procedure. Current optical imaging solutions also lack the necessary spatial resolution, penetration depth, and contrast for effective clinical use.

View Article and Find Full Text PDF

Background: Colorectal cancer (CRC) stands as the third most prevalent malignancy globally and is recognized as the second leading cause of cancer-related mortality. Notably, nearly 50% of individuals diagnosed with CRC ultimately develop metastatic disease, with the peritoneum emerging as the second most frequent site for metastatic spread. Recent advancements in therapeutic frameworks have enhanced both survival rates and quality of life metrics for patients afflicted with colorectal cancer peritoneal metastases (CRCPM).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!