AI Article Synopsis

  • Predictive scene parsing involves assigning labels to the pixels in future video frames, crucial for applications like autonomous driving and robot navigation.
  • The proposed STC-GAN model utilizes a unique architecture combining convolutional neural networks and LSTMs to effectively capture both spatial layout and motion dynamics.
  • Evaluated on datasets Cityscapes and CamVid, STC-GAN shows improved performance compared to existing methods, demonstrating its effectiveness in leveraging unlabeled video data.

Article Abstract

Predictive scene parsing is a task of assigning pixellevel semantic labels to a future frame of a video. It has many applications in vision-based artificial intelligent systems, e.g., autonomous driving and robot navigation. Although previous work has shown its promising performance in semantic segmentation of images and videos, it is still quite challenging to anticipate future scene parsing with limited annotated training data. In this paper, we propose a novel model called STC-GAN, Spatio-Temporally Coupled Generative Adversarial Networks for predictive scene parsing, which employ both convolutional neural networks and convolutional long short-term memory (LSTM) in the encoderdecoder architecture. By virtue of STC-GAN, both spatial layout and semantic context can be captured by the spatial encoder effectively, while motion dynamics are extracted by the temporal encoder accurately. Furthermore, a coupled architecture is presented for establishing joint adversarial training where the weights are shared and features are transformed in an adaptive fashion between the future frame generation model and predictive scene parsing model. Consequently, the proposed STCGAN is able to learn valuable features from unlabeled video data. We evaluate our proposed STC-GAN on two public datasets, i.e., Cityscapes and CamVid. Experimental results demonstrate that our method outperforms the state-of-the-art.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2020.2983567DOI Listing

Publication Analysis

Top Keywords

scene parsing
20
predictive scene
16
stc-gan spatio-temporally
8
spatio-temporally coupled
8
coupled generative
8
generative adversarial
8
adversarial networks
8
networks predictive
8
future frame
8
scene
5

Similar Publications

Oral Microbe Community and Pyramid Scene Parsing Network-based Periodontitis Risk Prediction.

Int Dent J

November 2024

Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an, China. Electronic address:

Article Synopsis
  • Periodontitis (PD) is a serious gum disease that can lead to tooth loss and other health issues, emphasizing the need for early detection.
  • This study uses a deep learning model called PSPNet along with dental plaque microbial data to create a Periodontitis Risk Score (PRS), aiming to identify individuals at high risk for developing PD.
  • The research found 27 key indicators for PD risk, demonstrating that the PRS can effectively differentiate between healthy individuals and PD patients with high accuracy in just 10 seconds per sample, paving the way for better early screening and preventive care.
View Article and Find Full Text PDF

DDNet: Depth Dominant Network for Semantic Segmentation of RGB-D Images.

Sensors (Basel)

October 2024

Division of Science, Engineering and Health Studies, School of Professional Education and Executive Development, The Hong Kong Polytechnic University, Hong Kong 999077, China.

Article Synopsis
  • Convolutional neural networks (CNNs) are commonly used for indoor scene parsing and object segmentation in color images, but they struggle with the lack of geometric and context information from RGB data alone.
  • This study introduces a new network called the Depth Dominant Network (DDNet) that emphasizes the utilization of depth map context, leveraging the geometric information found in depth images for better segmentation results.
  • DDNet features a dual-branch CNN design that prioritizes depth information for segmentation, while also incorporating RGB data to enrich the depth features, demonstrating superior performance on various RGB-D semantic segmentation benchmarks.
View Article and Find Full Text PDF

Hierarchical-Concatenate Fusion TDNN for sound event classification.

PLoS One

October 2024

School of Information Science and Engineering, Shenyang University of Technology, Shenyang City, Liaoning Province, China.

Semantic feature combination/parsing issue is one of the key problems in sound event classification for acoustic scene analysis, environmental sound monitoring, and urban soundscape analysis. The input audio signal in the acoustic scene classification is composed of multiple acoustic events, which usually leads to low recognition rate in complex environments. To address this issue, this paper proposes the Hierarchical-Concatenate Fusion(HCF)-TDNN model by adding HCF Module to ECAPA-TDNN model for sound event classification.

View Article and Find Full Text PDF

A neural mechanism for optic flow parsing in macaque visual cortex.

Curr Biol

November 2024

Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14627, USA. Electronic address:

For the brain to compute object motion in the world during self-motion, it must discount the global patterns of image motion (optic flow) caused by self-motion. Optic flow parsing is a proposed visual mechanism for computing object motion in the world, and studies in both humans and monkeys have demonstrated perceptual biases consistent with the operation of a flow-parsing mechanism. However, the neural basis of flow parsing remains unknown.

View Article and Find Full Text PDF
Article Synopsis
  • - Despite advancements in arbitrary image style transfer (AST), inconsistent evaluation methods make it difficult to compare different approaches effectively.
  • - The study introduces a multi-granularity assessment system that uses both objective metrics and subjective feedback to evaluate AST methods more reliably.
  • - By analyzing various AST techniques like CNN, flow, transformer, and diffusion-based methods, this research enhances evaluation standards, helping researchers make better comparisons and foster innovation in the field.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!