Human action recognition (HAR) in RGB-D videos has been widely investigated since the release of affordable depth sensors. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements with increasingly larger datasets. However, multimodal methods specifically with model-level fusion have seldom been investigated. In this article, we propose a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a model-based approach. The objective of our method is to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. For the model-based fusion scheme, we use a spatiotemporal graph convolution network for the skeleton modality to learn attention weights that will be transferred to the network of the RGB modality. Extensive experiments are conducted on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR. We also tested our MMNet on an RGB video dataset Kinetics 400 that contains more outdoor actions, which shows consistent results with those of RGB-D video datasets.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2022.3177813DOI Listing

Publication Analysis

Top Keywords

model-based multimodal
8
multimodal network
8
human action
8
action recognition
8
rgb-d videos
8
modalities model-based
8
mutually complementary
8
ntu rgb+d
8
rgb-d video
8
mmnet
4

Similar Publications

Objective: Accurate preoperative evaluation of myometrial invasion (MI) is essential for treatment decisions in endometrial cancer (EC). However, the diagnostic accuracy of commonly utilized magnetic resonance imaging (MRI) techniques for this assessment exhibits considerable variability. This study aims to enhance preoperative discrimination of absence or presence of MI by developing and validating a multimodal deep learning radiomics (MDLR) model based on MRI.

View Article and Find Full Text PDF

Introduction: Assessing the olfactory preferences of drivers can help improve the odor environment and enhance comfort during driving. However, the current evaluation methods have limited availability, including subjective evaluation, electroencephalogram, and behavioral action methods. Therefore, this study explores the potential of autonomic response signals for assessing the olfactory preferences.

View Article and Find Full Text PDF

Given the heterogeneous nature of attention-deficit/hyperactivity disorder (ADHD) and the absence of established biomarkers, accurate diagnosis and effective treatment remain a challenge in clinical practice. This study investigates the predictive utility of multimodal data, including eye tracking, EEG, actigraphy, and behavioral indices, in differentiating adults with ADHD from healthy individuals. Using a support vector machine model, we analyzed independent training (n = 50) and test (n = 36) samples from two clinically controlled studies.

View Article and Find Full Text PDF

In medical image segmentation, although multi-modality training is possible, clinical translation is challenged by the limited availability of all image types for a given patient. Different from typical segmentation models, modality-agnostic (MAG) learning trains a single model based on all available modalities but remains input-agnostic, allowing a single model to produce accurate segmentation given any modality combinations. In this paper, we propose a novel frame-work, MAG learning through Multi-modality Self-distillation (MAG-MS), for medical image segmentation.

View Article and Find Full Text PDF

Multimodal Deep Learning Fusing Clinical and Radiomics Scores for Prediction of Early-Stage Lung Adenocarcinoma Lymph Node Metastasis.

Acad Radiol

December 2024

School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.); Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.). Electronic address:

Rationale And Objectives: To develop and validate a multimodal deep learning (DL) model based on computed tomography (CT) images and clinical knowledge to predict lymph node metastasis (LNM) in early lung adenocarcinoma.

Materials And Methods: A total of 724 pathologically confirmed early invasive lung adenocarcinoma patients were retrospectively included from two centers. Clinical and CT semantic features of the patients were collected, and 3D radiomics features were extracted from nonenhanced CT images.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!