Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN-based models severely rely on the amounts of pixel-level annotations which are expensive and time-consuming. Considering that bounding boxes also contain abundant semantic and objective information, an intuitive solution is to learn the segmentation with weak supervisions from the bounding boxes. How to make full use of the class-level and region-level supervisions from bounding boxes to estimate the uncertain regions is the critical challenge for the weakly supervised learning task. In this paper, we propose a mixture model to address this problem. First, we introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we calculate the mean filling rates of each class to serve as an important prior cue to guide the model ignoring the wrongly labeled pixels in proposals. To realize the more fine-grained supervision at instance-level, we further propose the anchor-based filling rate shifting module. Unlike previous methods that directly train models with the generated noisy proposals, our method can adjust the model learning dynamically with the adaptive segmentation loss. Thus it can help reduce the negative impacts from wrongly labeled proposals. Besides, based on the learned high-quality proposals with above pipeline, we explore to further boost the performance through two-stage learning. The proposed method is evaluated on the challenging PASCAL VOC 2012 benchmark and achieves 74.9 % and 76.4 % mean IoU accuracy under weakly and semi-supervised modes, respectively. Extensive experimental results show that the proposed method is effective and is on par with, or even better than current state-of-the-art methods.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2023.3301302DOI Listing

Publication Analysis

Top Keywords

bounding boxes
12
weakly supervised
8
semantic segmentation
8
filling rate
8
rate shifting
8
supervisions bounding
8
wrongly labeled
8
proposed method
8
supervised semantic
4
segmentation
4

Similar Publications

YOLOSeg with applications to wafer die particle defect segmentation.

Sci Rep

January 2025

Department of Industrial Engineering and Management, Ming Chi University of Technology, New Taipei City, 243, Taiwan.

This study develops the you only look once segmentation (YOLOSeg), an end-to-end instance segmentation model, with applications to segment small particle defects embedded on a wafer die. YOLOSeg uses YOLOv5s as the basis and extends a UNet-like structure to form the segmentation head. YOLOSeg can predict not only bounding boxes of particle defects but also the corresponding bounding polygons.

View Article and Find Full Text PDF

Weakly-supervised thyroid ultrasound segmentation: Leveraging multi-scale consistency, contextual features, and bounding box supervision for accurate target delineation.

Comput Biol Med

January 2025

Department of Artificial Intelligence, Faculty of Artificial Intelligence, Egyptian Russian University, 11829, Badr City, Egypt. Electronic address:

Weakly-supervised learning (WSL) methods have gained significant attention in medical image segmentation, but they often face challenges in accurately delineating boundaries due to overfitting to weak annotations such as bounding boxes. This issue is particularly pronounced in thyroid ultrasound images, where low contrast and noisy backgrounds hinder precise segmentation. In this paper, we propose a novel weakly-supervised segmentation framework that addresses these challenges.

View Article and Find Full Text PDF

MEVDT: Multi-modal event-based vehicle detection and tracking dataset.

Data Brief

February 2025

Department of Electrical and Computer Engineering, University of Michigan-Dearborn, 4901 Evergreen Rd, Dearborn, 48128 MI, USA.

In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories.

View Article and Find Full Text PDF

Gripping Success Metric for Robotic Fruit Harvesting.

Sensors (Basel)

December 2024

Department of Computer Science & Artificial Intelligence, Jeonbuk National University, Jeonju-si 54896, Republic of Korea.

Recently, computer vision methods have been widely applied to agricultural tasks, such as robotic harvesting. In particular, fruit harvesting robots often rely on object detection or segmentation to identify and localize target fruits. During the model selection process for object detection, the average precision (AP) score typically provides the de facto standard.

View Article and Find Full Text PDF

Early identification of concrete cracks and multi-class detection can help to avoid future deformation or collapse in concrete structures. Available traditional detection and methodologies require enormous effort and time. To overcome such difficulties, current vision-based deep learning models can effectively detect and classify various concrete cracks.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!