In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2017.2699184DOI Listing

Publication Analysis

Top Keywords

semantic image
12
image segmentation
12
segmentation deep
8
deep convolutional
8
atrous convolution
8
fully connected
8
multiple scales
8
improve localization
8
deeplab semantic
4
image
4

Similar Publications

Introduction: Weeds are a major factor affecting crop yield and quality. Accurate identification and localization of crops and weeds are essential for achieving automated weed management in precision agriculture, especially given the challenges in recognition accuracy and real-time processing in complex field environments. To address this issue, this paper proposes an efficient crop-weed segmentation model based on an improved UNet architecture and attention mechanisms to enhance both recognition accuracy and processing speed.

View Article and Find Full Text PDF

ManiNeg: Manifestation-guided multimodal pretraining for mammography screening.

Comput Biol Med

January 2025

School of Automation Science and Engineering, South China University of Technology, Guangzhou, China. Electronic address:

Breast cancer poses a significant health threat worldwide. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning is negative sampling, where the selection of hard negative samples is essential for driving representations to retain detailed lesion information.

View Article and Find Full Text PDF

Lightweight Retinal Layer Segmentation With Global Reasoning.

IEEE Trans Instrum Meas

May 2024

School of Mechanical Engineering, Shandong University, Jinan 250061, Shandong, China.

Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications.

View Article and Find Full Text PDF

Purpose: To investigate image quality and agreement of derived cardiac function parameters in a novel joint image reconstruction and segmentation approach based on disentangled representation learning, enabling real-time cardiac cine imaging during free-breathing.

Methods: A multi-tasking neural network architecture, incorporating disentangled representation learning, was trained using simulated examinations based on data from a public repository along with MR scans specifically acquired for model development. An exploratory feasibility study evaluated the method on undersampled real-time acquisitions using an in-house developed spiral bSSFP pulse sequence in eight healthy participants and five patients with intermittent atrial fibrillation.

View Article and Find Full Text PDF

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!