Human action recognition (HAR) in RGB-D videos has been widely investigated since the release of affordable depth sensors. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements with increasingly larger datasets. However, multimodal methods specifically with model-level fusion have seldom been investigated. In this article, we propose a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a model-based approach. The objective of our method is to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. For the model-based fusion scheme, we use a spatiotemporal graph convolution network for the skeleton modality to learn attention weights that will be transferred to the network of the RGB modality. Extensive experiments are conducted on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR. We also tested our MMNet on an RGB video dataset Kinetics 400 that contains more outdoor actions, which shows consistent results with those of RGB-D video datasets.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2022.3177813 | DOI Listing |
Abdom Radiol (NY)
January 2025
The First Affiliated Hospital of Jinan University, Guangzhou, China.
Objective: Accurate preoperative evaluation of myometrial invasion (MI) is essential for treatment decisions in endometrial cancer (EC). However, the diagnostic accuracy of commonly utilized magnetic resonance imaging (MRI) techniques for this assessment exhibits considerable variability. This study aims to enhance preoperative discrimination of absence or presence of MI by developing and validating a multimodal deep learning radiomics (MDLR) model based on MRI.
View Article and Find Full Text PDFFront Bioeng Biotechnol
December 2024
School of Intelligent Manufacturing Engineering, Chongqing University of Arts and Sciences, Chongqing, China.
Introduction: Assessing the olfactory preferences of drivers can help improve the odor environment and enhance comfort during driving. However, the current evaluation methods have limited availability, including subjective evaluation, electroencephalogram, and behavioral action methods. Therefore, this study explores the potential of autonomic response signals for assessing the olfactory preferences.
View Article and Find Full Text PDFTransl Psychiatry
December 2024
Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany.
Given the heterogeneous nature of attention-deficit/hyperactivity disorder (ADHD) and the absence of established biomarkers, accurate diagnosis and effective treatment remain a challenge in clinical practice. This study investigates the predictive utility of multimodal data, including eye tracking, EEG, actigraphy, and behavioral indices, in differentiating adults with ADHD from healthy individuals. Using a support vector machine model, we analyzed independent training (n = 50) and test (n = 36) samples from two clinically controlled studies.
View Article and Find Full Text PDFProc IEEE Int Symp Biomed Imaging
May 2024
Department of Human Oncology, University of Wisconsin-Madison, Madison, WI, USA.
In medical image segmentation, although multi-modality training is possible, clinical translation is challenged by the limited availability of all image types for a given patient. Different from typical segmentation models, modality-agnostic (MAG) learning trains a single model based on all available modalities but remains input-agnostic, allowing a single model to produce accurate segmentation given any modality combinations. In this paper, we propose a novel frame-work, MAG learning through Multi-modality Self-distillation (MAG-MS), for medical image segmentation.
View Article and Find Full Text PDFAcad Radiol
December 2024
School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.); Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.). Electronic address:
Rationale And Objectives: To develop and validate a multimodal deep learning (DL) model based on computed tomography (CT) images and clinical knowledge to predict lymph node metastasis (LNM) in early lung adenocarcinoma.
Materials And Methods: A total of 724 pathologically confirmed early invasive lung adenocarcinoma patients were retrospectively included from two centers. Clinical and CT semantic features of the patients were collected, and 3D radiomics features were extracted from nonenhanced CT images.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!