The study investigates how to classify emotions in speech using machine learning and various audio features.
The researchers extracted audio features like Mel-frequency cepstral coefficients and zero-crossing rate, augmented their limited dataset, and combined audio files for analysis.
The random forest model with feature selection outperformed the one-dimensional convolutional neural network, achieving a 69% accuracy overall, highlighting specific misclassifications among emotions like anger and happiness.