Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824564 | PMC |
http://dx.doi.org/10.3390/s23010200 | DOI Listing |
Emotion
January 2025
Department of Psychology, Cognitive and Affective Neuroscience Unit, University of Zurich.
Affective voice signaling has significant biological and social relevance across various species, and different affective signaling types have emerged through the evolution of voice communication. These types range from basic affective voice bursts and nonverbal affective up to affective intonations superimposed on speech utterances in humans in the form of paraverbal prosodic patterns. These different types of affective signaling should have evolved to be acoustically and perceptually distinctive, allowing accurate and nuanced affective communication.
View Article and Find Full Text PDFBMC Neurosci
December 2024
Max Planck Institute for Biological Intelligence, Eberhard-Gwinner-Str., 82319, Seewiesen, Germany.
Zebra finches undergo a gradual refinement of their vocalizations, transitioning from variable juvenile songs to the stereotyped song of adulthood. To investigate the neural mechanisms underlying song crystallization-a critical phase in this developmental process-we performed intracellular recordings in HVC (a premotor nucleus essential for song learning and production) of juvenile birds. We then compared these recordings to previously published electrophysiological data from adult birds.
View Article and Find Full Text PDFFront Comput Neurosci
November 2024
Department of Physics, University of California, San Diego, La Jolla, CA, United States.
The nucleus HVC within the avian song system produces crystalized instructions which lead to precise, learned vocalization in zebra finches (). This paper proposes a model of the HVC neural network based on the physiological properties of individual HVC neurons, their synaptic interactions calibrated by experimental measurements, as well as the synaptic signal into this region which triggers song production. This neural network model comprises of two major neural populations in this area: neurons projecting to the nucleus RA and interneurons.
View Article and Find Full Text PDFJ Neurophysiol
December 2024
Department of Rehabilitation Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China.
Transcranial magnetic stimulation (TMS) is a noninvasive stimulation technique for modulating brain activity. However, selecting optimal control protocols to account for their neural and non-neural effects remains a challenge. To this end, the present event-related potential (ERP) study investigated the behavioral and neural effects of three commonly used control protocols, namely, sham stimulation and real stimulation with continuous theta burst stimulation (c-TBS) over the vertex and primary visual cortex (V1), on a given task manipulating pitch in voice auditory feedback.
View Article and Find Full Text PDFJ Acoust Soc Am
October 2024
Naval Information Warfare Center Pacific, San Diego, California 92152, USA.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!