AI Article Synopsis

  • Masked autoencoding often emphasizes low-level details in data reconstruction, which can hinder its ability to transfer high-level semantics effectively.
  • This article proposes a unique jigsaw puzzle solver that predicts the positions of disordered point cloud patches to enhance semantic learning, akin to how children learn through puzzles.
  • By using a transformer-based model to focus on high-level semantics while applying a consistency constraint, the proposed method shows significant performance improvements across various downstream vision tasks, achieving state-of-the-art results.

Article Abstract

Masked autoencoding has gained momentum for improving fine-tuning performance in many downstream tasks. However, it tends to focus on low-level reconstruction details, lacking high-level semantics and resulting in weak transfer capability. This article presents a novel jigsaw puzzle solver inspired by the idea that predicting the positions of disordered point cloud patches provides more semantic information, similar to how children learn by solving jigsaw puzzles. Our method adopts the mask-then-predict paradigm, erasing the positions of selected point patches rather than their contents. We first partition input point clouds into irregular patches and randomly erase the positions of some patches. Then, a Transformer-based model is used to learn high-level semantic features and regress the positions of the masked patches. This approach forces the model to focus on learning transfer-robust semantics while paying less attention to low-level details. To tie the predictions within the encoding space, we further introduce a consistency constraint on their latent representations to encourage the encoded features to contain more semantic cues. We demonstrate that a standard Transformer backbone with our pretraining scheme can capture discriminative point cloud semantic information. Furthermore, extensive experiments indicate that our method outperforms the previous best competitor across six popular downstream vision tasks, achieving new state-of-the-art performance. Codes will be available at https://git.openi.org.cn/OpenPointCloud/Point-MPP.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3479309DOI Listing

Publication Analysis

Top Keywords

point cloud
12
patches
5
point-mpp point
4
cloud self-supervised
4
self-supervised learning
4
learning masked
4
masked position
4
position prediction
4
prediction masked
4
masked autoencoding
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!