DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.

Liang-Chieh Chen George Papandreou Iasonas Kokkinos Kevin Murphy Alan L Yuille

IEEE Trans Pattern Anal Mach Intell

Published: April 2018

In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2017.2699184	DOI Listing

Publication Analysis

Top Keywords

semantic image

image segmentation

segmentation deep

deep convolutional

atrous convolution

fully connected

multiple scales

improve localization

deeplab semantic

image

Similar Publications

An improved U-net and attention mechanism-based model for sugar beet and weed segmentation.

Front Plant Sci

January 2025

College of Big Data, Yunnan Agricultural University, Kunming, China.

Yadong Li Ruinan Guo Rujia Li Rongbiao Ji Mengyao Wu

Introduction: Weeds are a major factor affecting crop yield and quality. Accurate identification and localization of crops and weeds are essential for achieving automated weed management in precision agriculture, especially given the challenges in recognition accuracy and real-time processing in complex field environments. To address this issue, this paper proposes an efficient crop-weed segmentation model based on an improved UNet architecture and attention mechanisms to enhance both recognition accuracy and processing speed.

View Article and Find Full Text PDF

Similar Publications

ManiNeg: Manifestation-guided multimodal pretraining for mammography screening.

Comput Biol Med

January 2025

School of Automation Science and Engineering, South China University of Technology, Guangzhou, China. Electronic address:

Xujun Li Xin Wei Jing Jiang Danxiang Chen Wei Zhang

Breast cancer poses a significant health threat worldwide. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning is negative sampling, where the selection of hard negative samples is essential for driving representations to retain detailed lesion information.

View Article and Find Full Text PDF

Similar Publications

Lightweight Retinal Layer Segmentation With Global Reasoning.

IEEE Trans Instrum Meas

May 2024

School of Mechanical Engineering, Shandong University, Jinan 250061, Shandong, China.

Xiang He Weiye Song Yiming Wang Fabio Poiesi Ji Yi

Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications.

View Article and Find Full Text PDF

Similar Publications

Joint image reconstruction and segmentation of real-time cardiac MRI in free-breathing using a model based on disentangled representation learning.

J Cardiovasc Magn Reson

January 2025

Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany.

Tobias Wech Oliver Schad Simon Sauer Jonas Kleineisel Nils Petri

Purpose: To investigate image quality and agreement of derived cardiac function parameters in a novel joint image reconstruction and segmentation approach based on disentangled representation learning, enabling real-time cardiac cine imaging during free-breathing.

Methods: A multi-tasking neural network architecture, incorporating disentangled representation learning, was trained using simulated examinations based on data from a public repository along with MR scans specifically acquired for model development. An exploratory feasibility study evaluated the method on undersampled real-time acquisitions using an in-house developed spiral bSSFP pulse sequence in eight healthy participants and five patients with intermittent atrial fibrillation.

View Article and Find Full Text PDF

Similar Publications

Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders.

Sci Rep

January 2025

Department of Electrical Power, Adama Science and Technology University, Adama, 1888, Ethiopia.

Muhammad Sajid Kaleem Razzaq Malik Ateeq Ur Rehman Tauqeer Safdar Malik Masoud Alajmi

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!