Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest.

Mohammad Mahdi Rezapour Mashhadi Kofi Osei-Bonsu

PLoS One

Published: November 2023

The study investigates how to classify emotions in speech using machine learning and various audio features.
The researchers extracted audio features like Mel-frequency cepstral coefficients and zero-crossing rate, augmented their limited dataset, and combined audio files for analysis.
The random forest model with feature selection outperformed the one-dimensional convolutional neural network, achieving a 69% accuracy overall, highlighting specific misclassifications among emotions like anger and happiness.

Speech is a direct and rich way of transmitting information and emotions from one point to another. In this study, we aimed to classify different emotions in speech using various audio features and machine learning models. We extracted various types of audio features such as Mel-frequency cepstral coefficients, chromogram, Mel-scale spectrogram, spectral contrast feature, Tonnetz representation and zero-crossing rate. We used a limited dataset of speech emotion recognition (SER) and augmented it with additional audios. In addition, In contrast to many previous studies, we combined all audio files together before conducting our analysis. We compared the performance of two models: one-dimensional convolutional neural network (conv1D) and random forest (RF), with RF-based feature selection. Our results showed that RF with feature selection achieved higher average accuracy (69%) than conv1D and had the highest precision for fear (72%) and the highest recall for calm (84%). Our study demonstrates the effectiveness of RF with feature selection for speech emotion classification using a limited dataset. We found for both algorithms, anger is misclassified mostly with happy, disgust with sad and neutral, and fear with sad. This could be due to the similarity of some acoustic features between these emotions, such as pitch, intensity, and tempo.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662716	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291500	PLOS

Publication Analysis

Top Keywords

speech emotion

feature selection

emotion recognition

machine learning

convolutional neural

neural network

random forest

audio features

limited dataset

speech

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!