Weakly supervised point cloud semantic segmentation methods that require 1% or fewer labels with the aim of realizing almost the same performance as fully supervised approaches have recently attracted extensive research attention. A typical solution in this framework is to use self-training or pseudo-labeling to mine the supervision from the point cloud itself while ignoring the critical information from images. In fact, cameras widely exist in LiDAR scenarios, and this complementary information seems to be highly important for 3D applications. In this paper, we propose a novel cross-modality weakly supervised method for 3D segmentation that incorporates complementary information from unlabeled images. We design a dual-branch network equipped with an active labeling strategy to maximize the power of tiny parts of labels and to directly realize 2D-to-3D knowledge transfer. Afterward, we establish a cross-modal self-training framework, which iterates between parameter updating and pseudolabel estimation. In the training phase, we propose cross-modal association learning to mine complementary supervision from images by reinforcing the cycle consistency between 3D points and 2D superpixels. In the pseudolabel estimation phase, a pseudolabel self-rectification mechanism is derived to filter noisy labels, thus providing more accurate labels for the networks to be fully trained. The extensive experimental results demonstrate that our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2024.3372449 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!