Learning to capture dependencies between spatial positions is essential to many visual tasks, especially the dense labeling problems like scene parsing. Existing methods can effectively capture long-range dependencies with self-attention mechanism while short ones by local convolution. However, there is still much gap between long-range and short-range dependencies, which largely reduces the models' flexibility in application to diverse spatial scales and relationships in complicated natural scene images. To fill such a gap, we develop a Middle-Range (MR) branch to capture middle-range dependencies by restricting self-attention into local patches. Also, we observe that the spatial regions which have large correlations with others can be emphasized to exploit long-range dependencies more accurately, and thus propose a Reweighed Long-Range (RLR) branch. Based on the proposed MR and RLR branches, we build an Omni-Range Dependencies Network (ORDNet) which can effectively capture short-, middle- and long-range dependencies. Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images. Extensive experiments show that our proposed ORDNet outperforms previous state-of-the-art methods on three scene parsing benchmarks including PASCAL Context, COCO Stuff and ADE20K, demonstrating the superiority of capturing omni-range dependencies in deep models for scene parsing task.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2020.3013142 | DOI Listing |
Int Dent J
November 2024
Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an, China. Electronic address:
Sensors (Basel)
October 2024
Division of Science, Engineering and Health Studies, School of Professional Education and Executive Development, The Hong Kong Polytechnic University, Hong Kong 999077, China.
PLoS One
October 2024
School of Information Science and Engineering, Shenyang University of Technology, Shenyang City, Liaoning Province, China.
Semantic feature combination/parsing issue is one of the key problems in sound event classification for acoustic scene analysis, environmental sound monitoring, and urban soundscape analysis. The input audio signal in the acoustic scene classification is composed of multiple acoustic events, which usually leads to low recognition rate in complex environments. To address this issue, this paper proposes the Hierarchical-Concatenate Fusion(HCF)-TDNN model by adding HCF Module to ECAPA-TDNN model for sound event classification.
View Article and Find Full Text PDFCurr Biol
November 2024
Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14627, USA. Electronic address:
For the brain to compute object motion in the world during self-motion, it must discount the global patterns of image motion (optic flow) caused by self-motion. Optic flow parsing is a proposed visual mechanism for computing object motion in the world, and studies in both humans and monkeys have demonstrated perceptual biases consistent with the operation of a flow-parsing mechanism. However, the neural basis of flow parsing remains unknown.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
September 2024
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!