Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst.

Sensors (Basel)

Department of Artificial Intelligence Convergence, Chonnam National University, 77 Yongbong-ro, Gwangju 500-757, Republic of Korea.

Published: December 2022

Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824564PMC
http://dx.doi.org/10.3390/s23010200DOI Listing

Publication Analysis

Top Keywords

vocal burst
16
emotion recognition
12
self-relation attention
8
attention temporal
8
temporal awareness
8
sra-ta module
8
audio signal
8
proposed method
8
latent features
8
vocal
5

Similar Publications

Affective voice signaling has significant biological and social relevance across various species, and different affective signaling types have emerged through the evolution of voice communication. These types range from basic affective voice bursts and nonverbal affective up to affective intonations superimposed on speech utterances in humans in the form of paraverbal prosodic patterns. These different types of affective signaling should have evolved to be acoustically and perceptually distinctive, allowing accurate and nuanced affective communication.

View Article and Find Full Text PDF

Zebra finches undergo a gradual refinement of their vocalizations, transitioning from variable juvenile songs to the stereotyped song of adulthood. To investigate the neural mechanisms underlying song crystallization-a critical phase in this developmental process-we performed intracellular recordings in HVC (a premotor nucleus essential for song learning and production) of juvenile birds. We then compared these recordings to previously published electrophysiological data from adult birds.

View Article and Find Full Text PDF

Model of the HVC neural network as a song motor in zebra finch.

Front Comput Neurosci

November 2024

Department of Physics, University of California, San Diego, La Jolla, CA, United States.

The nucleus HVC within the avian song system produces crystalized instructions which lead to precise, learned vocalization in zebra finches (). This paper proposes a model of the HVC neural network based on the physiological properties of individual HVC neurons, their synaptic interactions calibrated by experimental measurements, as well as the synaptic signal into this region which triggers song production. This neural network model comprises of two major neural populations in this area: neurons projecting to the nucleus RA and interneurons.

View Article and Find Full Text PDF

Understanding the effects of different transcranial magnetic stimulation control protocols: a behavioral and neural perspective.

J Neurophysiol

December 2024

Department of Rehabilitation Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China.

Transcranial magnetic stimulation (TMS) is a noninvasive stimulation technique for modulating brain activity. However, selecting optimal control protocols to account for their neural and non-neural effects remains a challenge. To this end, the present event-related potential (ERP) study investigated the behavioral and neural effects of three commonly used control protocols, namely, sham stimulation and real stimulation with continuous theta burst stimulation (c-TBS) over the vertex and primary visual cortex (V1), on a given task manipulating pitch in voice auditory feedback.

View Article and Find Full Text PDF
Article Synopsis
  • Continuous active sonar produces lower sound pressure levels compared to traditional pulsed active sonar but can cause higher auditory masking due to its constant operation.
  • The study evaluates how different noise types, including continuous active sonar, affect signal detection in killer whales using both a pure tone and a whale call.
  • Results show that while other noise types allowed for some frequency detection, continuous active sonar significantly overlaps with killer whale calls, making it a strong auditory masker.*
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!