On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition.

Juraj Kacur Boris Puterka Jarmila Pavlovicova Milos Oravec

Sensors (Basel)

Institute of Computer Science and Mathematics, Faculty of Electrical Engineering and Information Technology, Slovak University of Technology in Bratislava, 2412 Bratislava, Slovakia.

Published: March 2021

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions-lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired -test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0-8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7962835	PMC
http://dx.doi.org/10.3390/s21051888	DOI Listing

Publication Analysis

Top Keywords

speech emotion

emotion recognition

vocal tract

speech characteristics

filter banks

speech

methods

speech properties

properties feature

feature extraction

Similar Publications

Biological, linguistic, and individual factors govern voice qualitya).

J Acoust Soc Am

January 2025

USC Viterbi School of Engineering, University of Southern California, Los Angeles, California 90089-1455, USA.

Jody Kreiman Yoonjeong Lee

Voice quality serves as a rich source of information about speakers, providing listeners with impressions of identity, emotional state, age, sex, reproductive fitness, and other biologically and socially salient characteristics. Understanding how this information is transmitted, accessed, and exploited requires knowledge of the psychoacoustic dimensions along which voices vary, an area that remains largely unexplored. Recent studies of English speakers have shown that two factors related to speaker size and arousal consistently emerge as the most important determinants of quality, regardless of who is speaking.

View Article and Find Full Text PDF

Similar Publications

Students perspectives on the development and deployment of an AI-enabled service robot in long-term care.

J Rehabil Assist Technol Eng

January 2025

School of Nursing, University of British Columbia, Vancouver, BC, Canada.

Lillian Hung Abdul-Fatawu Abdulai Albin Soni Karen Lok Yi Wong Lily Haopu Ren

The need for Artificial Intelligence (AI) in gerontology education is underscored by the potential benefits it offers in addressing loneliness and supporting social connection among older adults in long-term care (LTC) homes. While the workforce in LTC is often overburdened, AI-enabled service robots present possible solutions to enhance residents' quality of life. However, the incorporation of AI and service robots in current gerontology curricula is lacking, and the views of students on this subject remain largely unexamined.

View Article and Find Full Text PDF

Similar Publications

Feasibility, acceptability, and perceived benefits of a creative arts intervention for elementary school children living with speech, language and communication disorders.

Front Child Adolesc Psychiatry

June 2024

Department of Psychology, Bishop's University, Sherbrooke, QC, Canada.

T Léger-Goodes C M Herba Z Moula A Mendrek K Hurtubise

Background: Children with speech, language, and communication disorders require specialized support in response to their emotional expression challenges. Not only is such support key for their development, but it is also essential for their mental well-being. Art making emerges as a valuable tool for enabling these children to convey emotions both verbally and non-verbally, fostering a positive self-concept.

View Article and Find Full Text PDF

Similar Publications

Voice-related quality of life after total laryngectomy: a scoping review of recent evidence.

Health Qual Life Outcomes

January 2025

Department of Speech and Language Therapy, School of Health Sciences, University of Ioannina, Ioannina, Greece.

Tatiana Pourliaka Efcharis Panagopoulou Vassiliki Siafaka

Background: Laryngeal cancer often leads to total laryngectomy (TL), which results in the loss of natural voice, necessitates voice rehabilitation and affects the individuals Quality of Life (QoL). Despite advancements in treatment, Voice-Related QoL (VRQoL) post TL remains a neglected area in the field of rehabilitation. This study seeks to fill this gap by evaluating though a scoping review the impacts of TL on patients' voice-related QoL.

View Article and Find Full Text PDF

Similar Publications

The affective iconicity of lexical tone: Evidence from standard Chinesea).

J Acoust Soc Am

January 2025

Leiden University Centre for Linguistics, Leiden University, Leiden, The Netherlands.

Tingting Zheng Clara C Levelt Yiya Chen

Previous studies suggested that pitch characteristics of lexical tones in Standard Chinese influence various sensory perceptions, but whether they iconically bias emotional experience remained unclear. We analyzed the arousal and valence ratings of bi-syllabic words in two corpora (Study 1) and conducted an affect rating experiment using a carefully designed corpus of bi-syllabic words (Study 2). Two-alternative forced-choice tasks further tested the robustness of lexical tones' affective iconicity in an auditory nonce word context (Study 3).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!