Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.

Sensors (Basel)

Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of Korea.

Published: July 2023

Understanding and identifying emotional cues in human speech is a crucial aspect of human-computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10383041PMC
http://dx.doi.org/10.3390/s23146640DOI Listing

Publication Analysis

Top Keywords

speech emotion
16
emotion recognition
12
speech
8
human speech
8
semantic feature
8
emotion
5
enhancing speech
4
recognition dual
4
feature
4
dual feature
4

Similar Publications

Aim: To explore hoarding scenarios in older adults with dementia, document management strategies and assess caregiver challenges in these scenarios.

Design: This study employed interpretative phenomenological analysis to guide data collection and analysis.

Methods: Purposive sampling recruited 20 caregivers of older adults with dementia from long-term care facilities and community elderly centres in Taiwan.

View Article and Find Full Text PDF

Cerebral palsy (CP) is a group of neurological disorders that affect movement, muscle tone, and motor skills. Here, we present a case of an 11-year-old female patient who presented with tightness in both lower limbs, since birth, and delayed walking, accompanied by difficulty walking due to spasticity. She was diagnosed with spastic diplegic cerebral palsy.

View Article and Find Full Text PDF

The Dysphagia Handicap Index (DHI) is commonly utilized for evaluating how dysphagia impacts the quality of life (QoL) of patients across physical, functional, and emotional dimensions. The primary aim of the research was to linguistically validate and culturally adapt the DHI to the Bangla version. A cross-sectional study design was chosen, with Beaton's protocol as the guiding framework for validating and adapting the DHI.

View Article and Find Full Text PDF

Stages and causes of the evolution of language and consciousness: a theoretical reconstruction.

Biosystems

December 2024

Institute of Philosophy and Law, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia; Department of International Relations and Regional Studies, Novosibirsk State Technical University, Novosibirsk, Russia.

This article presents a refinement of theoretical explanations of the main stages of linguistic and cognitive evolution in anthropogenesis. The concepts of language, consciousness, self-consciousness, the self, the unconscious, the subconscious, and the relation between free will and determinism remain at the center of active and complex debates in philosophy and neuroscience. A basic theoretical apparatus comprising the central concepts of "concern" and "providing structure" (an extension of the biological concept of "adaptation") develops the paradigm of the extended evolutionary synthesis.

View Article and Find Full Text PDF

Background: Late side effects of head and neck cancer treatment commonly affect swallowing function. Late radiation-associated dysphagia (late-RAD) often presents years post-treatment when patients have been discharged from their multidisciplinary team. Timely symptom management may provide important physical and emotional support, potentially reducing the overall healthcare burden.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!