Learning to capture dependencies between spatial positions is essential to many visual tasks, especially the dense labeling problems like scene parsing. Existing methods can effectively capture long-range dependencies with self-attention mechanism while short ones by local convolution. However, there is still much gap between long-range and short-range dependencies, which largely reduces the models' flexibility in application to diverse spatial scales and relationships in complicated natural scene images. To fill such a gap, we develop a Middle-Range (MR) branch to capture middle-range dependencies by restricting self-attention into local patches. Also, we observe that the spatial regions which have large correlations with others can be emphasized to exploit long-range dependencies more accurately, and thus propose a Reweighed Long-Range (RLR) branch. Based on the proposed MR and RLR branches, we build an Omni-Range Dependencies Network (ORDNet) which can effectively capture short-, middle- and long-range dependencies. Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images. Extensive experiments show that our proposed ORDNet outperforms previous state-of-the-art methods on three scene parsing benchmarks including PASCAL Context, COCO Stuff and ADE20K, demonstrating the superiority of capturing omni-range dependencies in deep models for scene parsing task.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2020.3013142DOI Listing

Publication Analysis

Top Keywords

scene parsing
16
omni-range dependencies
12
long-range dependencies
12
dependencies
9
capturing omni-range
8
effectively capture
8
scene images
8
scene
6
long-range
5
ordnet
4

Similar Publications

Oral Microbe Community and Pyramid Scene Parsing Network-based Periodontitis Risk Prediction.

Int Dent J

November 2024

Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an, China. Electronic address:

Article Synopsis
  • Periodontitis (PD) is a serious gum disease that can lead to tooth loss and other health issues, emphasizing the need for early detection.
  • This study uses a deep learning model called PSPNet along with dental plaque microbial data to create a Periodontitis Risk Score (PRS), aiming to identify individuals at high risk for developing PD.
  • The research found 27 key indicators for PD risk, demonstrating that the PRS can effectively differentiate between healthy individuals and PD patients with high accuracy in just 10 seconds per sample, paving the way for better early screening and preventive care.
View Article and Find Full Text PDF

DDNet: Depth Dominant Network for Semantic Segmentation of RGB-D Images.

Sensors (Basel)

October 2024

Division of Science, Engineering and Health Studies, School of Professional Education and Executive Development, The Hong Kong Polytechnic University, Hong Kong 999077, China.

Article Synopsis
  • Convolutional neural networks (CNNs) are commonly used for indoor scene parsing and object segmentation in color images, but they struggle with the lack of geometric and context information from RGB data alone.
  • This study introduces a new network called the Depth Dominant Network (DDNet) that emphasizes the utilization of depth map context, leveraging the geometric information found in depth images for better segmentation results.
  • DDNet features a dual-branch CNN design that prioritizes depth information for segmentation, while also incorporating RGB data to enrich the depth features, demonstrating superior performance on various RGB-D semantic segmentation benchmarks.
View Article and Find Full Text PDF

Hierarchical-Concatenate Fusion TDNN for sound event classification.

PLoS One

October 2024

School of Information Science and Engineering, Shenyang University of Technology, Shenyang City, Liaoning Province, China.

Semantic feature combination/parsing issue is one of the key problems in sound event classification for acoustic scene analysis, environmental sound monitoring, and urban soundscape analysis. The input audio signal in the acoustic scene classification is composed of multiple acoustic events, which usually leads to low recognition rate in complex environments. To address this issue, this paper proposes the Hierarchical-Concatenate Fusion(HCF)-TDNN model by adding HCF Module to ECAPA-TDNN model for sound event classification.

View Article and Find Full Text PDF

A neural mechanism for optic flow parsing in macaque visual cortex.

Curr Biol

November 2024

Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14627, USA. Electronic address:

For the brain to compute object motion in the world during self-motion, it must discount the global patterns of image motion (optic flow) caused by self-motion. Optic flow parsing is a proposed visual mechanism for computing object motion in the world, and studies in both humans and monkeys have demonstrated perceptual biases consistent with the operation of a flow-parsing mechanism. However, the neural basis of flow parsing remains unknown.

View Article and Find Full Text PDF
Article Synopsis
  • - Despite advancements in arbitrary image style transfer (AST), inconsistent evaluation methods make it difficult to compare different approaches effectively.
  • - The study introduces a multi-granularity assessment system that uses both objective metrics and subjective feedback to evaluate AST methods more reliably.
  • - By analyzing various AST techniques like CNN, flow, transformer, and diffusion-based methods, this research enhances evaluation standards, helping researchers make better comparisons and foster innovation in the field.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!