Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions-lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired -test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0-8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7962835 | PMC |
http://dx.doi.org/10.3390/s21051888 | DOI Listing |
J Acoust Soc Am
January 2025
USC Viterbi School of Engineering, University of Southern California, Los Angeles, California 90089-1455, USA.
Voice quality serves as a rich source of information about speakers, providing listeners with impressions of identity, emotional state, age, sex, reproductive fitness, and other biologically and socially salient characteristics. Understanding how this information is transmitted, accessed, and exploited requires knowledge of the psychoacoustic dimensions along which voices vary, an area that remains largely unexplored. Recent studies of English speakers have shown that two factors related to speaker size and arousal consistently emerge as the most important determinants of quality, regardless of who is speaking.
View Article and Find Full Text PDFJ Rehabil Assist Technol Eng
January 2025
School of Nursing, University of British Columbia, Vancouver, BC, Canada.
The need for Artificial Intelligence (AI) in gerontology education is underscored by the potential benefits it offers in addressing loneliness and supporting social connection among older adults in long-term care (LTC) homes. While the workforce in LTC is often overburdened, AI-enabled service robots present possible solutions to enhance residents' quality of life. However, the incorporation of AI and service robots in current gerontology curricula is lacking, and the views of students on this subject remain largely unexamined.
View Article and Find Full Text PDFFront Child Adolesc Psychiatry
June 2024
Department of Psychology, Bishop's University, Sherbrooke, QC, Canada.
Background: Children with speech, language, and communication disorders require specialized support in response to their emotional expression challenges. Not only is such support key for their development, but it is also essential for their mental well-being. Art making emerges as a valuable tool for enabling these children to convey emotions both verbally and non-verbally, fostering a positive self-concept.
View Article and Find Full Text PDFHealth Qual Life Outcomes
January 2025
Department of Speech and Language Therapy, School of Health Sciences, University of Ioannina, Ioannina, Greece.
Background: Laryngeal cancer often leads to total laryngectomy (TL), which results in the loss of natural voice, necessitates voice rehabilitation and affects the individuals Quality of Life (QoL). Despite advancements in treatment, Voice-Related QoL (VRQoL) post TL remains a neglected area in the field of rehabilitation. This study seeks to fill this gap by evaluating though a scoping review the impacts of TL on patients' voice-related QoL.
View Article and Find Full Text PDFJ Acoust Soc Am
January 2025
Leiden University Centre for Linguistics, Leiden University, Leiden, The Netherlands.
Previous studies suggested that pitch characteristics of lexical tones in Standard Chinese influence various sensory perceptions, but whether they iconically bias emotional experience remained unclear. We analyzed the arousal and valence ratings of bi-syllabic words in two corpora (Study 1) and conducted an affect rating experiment using a carefully designed corpus of bi-syllabic words (Study 2). Two-alternative forced-choice tasks further tested the robustness of lexical tones' affective iconicity in an auditory nonce word context (Study 3).
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!