On-the-fly Modulation for Balanced Multimodal Learning.

IEEE Trans Pattern Anal Mach Intell

Published: September 2024

Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective for all modalities, leads to imbalanced and under-optimized uni-modal representations. Specifically, we point out that there often exists modality with more discriminative information, e.g., vision of playing football and sound of blowing wind. They could dominate the joint training process, resulting in other modalities being significantly under-optimized. To alleviate this problem, we first analyze the under-optimized phenomenon from both the feed-forward and the back-propagation stages during optimization. Then, On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies are proposed to modulate the optimization of each modality, by monitoring the discriminative discrepancy between modalities during training. Concretely, OPM weakens the influence of the dominant modality by dropping its feature with dynamical probability in the feed-forward stage, while OGM mitigates its gradient in the back-propagation stage. In experiments, our methods demonstrate considerable improvement across a variety of multimodal tasks. These simple yet effective strategies not only enhance performance in vanilla and task-oriented multimodal models, but also in more complex multimodal tasks, showcasing their effectiveness and flexibility.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3468315DOI Listing

Publication Analysis

Top Keywords

multimodal learning
8
joint training
8
multimodal tasks
8
multimodal
5
on-the-fly modulation
4
modulation balanced
4
balanced multimodal
4
learning multimodal
4
learning expected
4
expected boost
4

Similar Publications

Although radiotherapy techniques are the primary treatment for head and neck cancer (HNC), they are still associated with substantial toxicity, and side effect. Machine learning (ML) based radiomics models for predicting toxicity mostly rely on features extracted from pre-treatment imaging data. This study aims to compare different models in predicting radiation-induced xerostomia and sticky saliva in both early and late stage of HNC patients using CT and MRI image features along with demographics and dosimetric information.

View Article and Find Full Text PDF

Automatic multimodal registration of cone-beam computed tomography and intraoral scans: a systematic review and meta-analysis.

Clin Oral Investig

January 2025

Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Clinical Research Center for Oral Diseases of Zhejiang Province, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, 310006, China.

Objectives: To evaluate recent advances in the automatic multimodal registration of cone-beam computed tomography (CBCT) and intraoral scans (IOS) and their clinical significance in dentistry.

Methods: A comprehensive literature search was conducted in October 2024 across the PubMed, Web of Science, and IEEE Xplore databases, including studies that were published in the past decade. The inclusion criteria were as follows: English-language studies, randomized and nonrandomized controlled trials, cohort studies, case-control studies, cross-sectional studies, and retrospective studies.

View Article and Find Full Text PDF

. The released CMRxRecon2024 dataset is currently the largest and most protocol-diverse publicly available k-space dataset including multi-modality and multi-view cardiac MRI data from 330 healthy volunteers, and each one covers standardized and commonly used clinical protocols. ©RSNA, 2025.

View Article and Find Full Text PDF

Neurodevelopmental impairments associated with congenital heart disease (CHD) may arise from perturbations in brain developmental pathways, including the formation of sulcal patterns. While genetic factors contribute to sulcal features, the association of noncoding variants (ncDNVs) with sulcal patterns in people with CHD remains poorly understood. Leveraging deep learning models, we examined the predicted impact of ncDNVs on gene regulatory signals.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!