In this paper, we address the problem of the high annotation cost of acquiring training data for semantic segmentation. Most modern approaches to semantic segmentation are based upon graphical models, such as the conditional random fields, and rely on sufficient training data in form of object contours. To reduce the manual effort on pixel-wise annotating contours, we consider the setting in which the training data set for semantic segmentation is a mixture of a few object contours and an abundant set of bounding boxes of objects. Our idea is to borrow the knowledge derived from the object contours to infer the unknown object contours enclosed by the bounding boxes. The inferred contours can then serve as training data for semantic segmentation. To this end, we generate multiple contour hypotheses for each bounding box with the assumption that at least one hypothesis is close to the ground truth. This paper proposes an approach, called augmented multiple instance regression (AMIR), that formulates the task of hypothesis selection as the problem of multiple instance regression (MIR), and augments information derived from the object contours to guide and regularize the training process of MIR. In this way, a bounding box is treated as a bag with its contour hypotheses as instances, and the positive instances refer to the hypotheses close to the ground truth. The proposed approach has been evaluated on the Pascal VOC segmentation task. The promising results demonstrate that AMIR can precisely infer the object contours in the bounding boxes, and hence provide effective alternatives to manually labeled contours for semantic segmentation.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2014.2307436DOI Listing

Publication Analysis

Top Keywords

object contours
28
semantic segmentation
20
bounding boxes
16
training data
16
multiple instance
12
instance regression
12
contours
10
augmented multiple
8
contours bounding
8
data semantic
8

Similar Publications

Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer.

Sensors (Basel)

December 2024

Master's Program in Information and Computer Science, Doshisha University, Kyoto 610-0394, Japan.

The semantic segmentation of bone structures demands pixel-level classification accuracy to create reliable bone models for diagnosis. While Convolutional Neural Networks (CNNs) are commonly used for segmentation, they often struggle with complex shapes due to their focus on texture features and limited ability to incorporate positional information. As orthopedic surgery increasingly requires precise automatic diagnosis, we explored SegFormer, an enhanced Vision Transformer model that better handles spatial awareness in segmentation tasks.

View Article and Find Full Text PDF

Introduction: Global Visual Selective Attention (VSA) is the ability to integrate multiple visual elements of a scene to achieve visual overview. This is essential for navigating crowded environments and recognizing objects or faces. Clinical pediatric research on global VSA deficits primarily focuses on autism spectrum disorder (ASD).

View Article and Find Full Text PDF

With the advancement of service robot technology, the demand for higher boundary precision in indoor semantic segmentation has increased. Traditional methods of extracting Euclidean features using point cloud and voxel data often neglect geodesic information, reducing boundary accuracy for adjacent objects and consuming significant computational resources. This study proposes a novel network, the Euclidean-geodesic network (EGNet), which uses point cloud-voxel-mesh data to characterize detail, contour, and geodesic features, respectively.

View Article and Find Full Text PDF

Coronary artery stenosis detection remains a challenging task due to the complex vascular structure, poor quality of imaging pictures, poor vessel contouring caused by breathing artifacts and stenotic lesions that often appear in a small region of the image. In order to improve the accuracy and efficiency of detection, a new deep-learning technique based on a coronary artery stenosis detection framework (DCA-YOLOv8) is proposed in this paper. The framework consists of a histogram equalization and canny edge detection preprocessing (HEC) enhancement module, a double coordinate attention (DCA) feature extraction module and an output module that combines a newly designed loss function, named adaptive inner-CIoU (AICI).

View Article and Find Full Text PDF

Image segmentation is a crucial task in artificial intelligence fields such as computer vision and medical imaging. While convolutional neural networks (CNNs) have achieved notable success by learning representative features from large datasets, they often lack geometric priors and global object information, limiting their accuracy in complex scenarios. Variational methods like active contours provide geometric priors and theoretical interpretability but require manual initialization and are sensitive to hyper-parameters.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!