In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2017.2699184 | DOI Listing |
Front Plant Sci
January 2025
College of Big Data, Yunnan Agricultural University, Kunming, China.
Introduction: Weeds are a major factor affecting crop yield and quality. Accurate identification and localization of crops and weeds are essential for achieving automated weed management in precision agriculture, especially given the challenges in recognition accuracy and real-time processing in complex field environments. To address this issue, this paper proposes an efficient crop-weed segmentation model based on an improved UNet architecture and attention mechanisms to enhance both recognition accuracy and processing speed.
View Article and Find Full Text PDFComput Biol Med
January 2025
School of Automation Science and Engineering, South China University of Technology, Guangzhou, China. Electronic address:
Breast cancer poses a significant health threat worldwide. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning is negative sampling, where the selection of hard negative samples is essential for driving representations to retain detailed lesion information.
View Article and Find Full Text PDFIEEE Trans Instrum Meas
May 2024
School of Mechanical Engineering, Shandong University, Jinan 250061, Shandong, China.
Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications.
View Article and Find Full Text PDFJ Cardiovasc Magn Reson
January 2025
Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany.
Purpose: To investigate image quality and agreement of derived cardiac function parameters in a novel joint image reconstruction and segmentation approach based on disentangled representation learning, enabling real-time cardiac cine imaging during free-breathing.
Methods: A multi-tasking neural network architecture, incorporating disentangled representation learning, was trained using simulated examinations based on data from a public repository along with MR scans specifically acquired for model development. An exploratory feasibility study evaluated the method on undersampled real-time acquisitions using an in-house developed spiral bSSFP pulse sequence in eight healthy participants and five patients with intermittent atrial fibrillation.
Sci Rep
January 2025
Department of Electrical Power, Adama Science and Technology University, Adama, 1888, Ethiopia.
Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!