Deep-Learning-Based Multimodal Emotion Classification for Music Videos.

Sensors (Basel)

Department of Computer Science and Engineering, Jeonbuk National University, Jeonju-City 54896, Korea.

Published: July 2021

Music videos contain a great deal of visual and acoustic information. Each information source within a music video influences the emotions conveyed through the audio and video, suggesting that only a multimodal approach is capable of achieving efficient affective computing. This paper presents an affective computing system that relies on music, video, and facial expression cues, making it useful for emotional analysis. We applied the audio-video information exchange and boosting methods to regularize the training process and reduced the computational costs by using a separable convolution strategy. In sum, our empirical findings are as follows: (1) Multimodal representations efficiently capture all acoustic and visual emotional clues included in each music video, (2) the computational cost of each neural network is significantly reduced by factorizing the standard 2D/3D convolution into separate channels and spatiotemporal interactions, and (3) information-sharing methods incorporated into multimodal representations are helpful in guiding individual information flow and boosting overall performance. We tested our findings across several unimodal and multimodal networks against various evaluation metrics and visual analyzers. Our best classifier attained 74% accuracy, an f1-score of 0.73, and an area under the curve score of 0.926.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309938PMC
http://dx.doi.org/10.3390/s21144927DOI Listing

Publication Analysis

Top Keywords

music video
12
music videos
8
affective computing
8
multimodal representations
8
music
5
deep-learning-based multimodal
4
multimodal emotion
4
emotion classification
4
classification music
4
videos music
4

Similar Publications

The purpose of this study is to investigate how deep learning and other artificial intelligence (AI) technologies can be used to enhance the intelligent level of dance instruction. The study develops a dance action recognition and feedback model based on the Graph Attention Mechanism (GA) and Bidirectional Gated Recurrent Unit (3D-Resnet-BigRu). In this model, time series features are captured using BiGRU after 3D-ResNet is inserted to extract video features.

View Article and Find Full Text PDF

Background: Difficulties in emotional regulation are often observed in children and adolescents with attention-deficit/hyperactivity disorder (ADHD). Innovative complementary treatments, such as video games and virtual reality, have become increasingly appealing to patients. The Secret Trail of Moon (MOON) is a serious video game developed by a multidisciplinary team featuring cognitive training exercises.

View Article and Find Full Text PDF

Background: Pediatric patients with cancer have limited options to self-manage their health while they are undergoing treatments in the hospital and after they are discharged to their homes. Extended reality (ER) using head-mounted displays has emerged as an immersive method of improving pain and mental health and promoting health-enhancing physical activity among a variety of clinical groups, but there is currently no established protocol for improving both physical and mental health in pediatric cancer rehabilitation.

Objective: This phase I, pilot, feasibility randomized controlled trial aims to investigate the potential effects of a 14-week ER program on physical activity participation and indicators of health among pediatric patients with cancer who undergo bone marrow transplantation.

View Article and Find Full Text PDF

Multimodal critical discourse analysis is a dynamic approach to qualitative data analysis that expands critical discourse analysis to include multiple communicative modes-such as images, graphics, video, and sound/music-into the semiotic analysis of ideology and power relations within contemporary forms of communication. We reflect on the potential of multimodal critical discourse analysis to be combined with arts-based health research as an analytic method to deconstruct discourses that shape the health and well-being of marginalized communities. Specifically, we frame this potential within our research about men's body image based a project using cellphilming and the deconstruction of cis-heteronormative and related ideologies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!