Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to scale large language or visual-language models efficiently, these efforts typically involve fewer experts and limited modalities. To address this, our work presents the pioneering attempt to develop a unified MLLM with the MoE architecture, named Uni-MoE that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We also implement a sparse MoE architecture within the LLMs to enable efficient training and inference through modality-level data parallelism and expert-level model parallelism. To enhance the multi-expert collaboration and generalization, we present a progressive training strategy: 1) Cross-modality alignment using various connectors with different cross-modality data, 2) Training modality-specific experts with cross-modality instruction data to activate experts' preferences, and 3) Tuning the whole Uni-MoE framework utilizing Low-Rank Adaptation (LoRA) on mixed multimodal instruction data. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets. The extensive experimental results demonstrate Uni-MoE's principal advantage of significantly reducing performance bias in handling mixed multimodal datasets, alongside improved multi-expert collaboration and generalization. Our findings highlight the substantial potential of MoE frameworks in advancing MLLMs and the code is available at https://github.com/HITsz-TMG/UMOE-Scaling-Unified-Multimodal-LLMs.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2025.3532688DOI Listing

Publication Analysis

Top Keywords

moe architecture
12
unified multimodal
8
mixture experts
8
large language
8
multi-expert collaboration
8
collaboration generalization
8
instruction data
8
mixed multimodal
8
multimodal datasets
8
multimodal
6

Similar Publications

All-electrical layer-spintronics in altermagnetic bilayers.

Mater Horiz

March 2025

Science, Mathematics and Technology (SMT) Cluster, Singapore University of Technology and Design, Singapore, 487372, Singapore.

Electrical manipulation of spin-polarized current is highly desirable yet tremendously challenging in developing ultracompact spintronic device technology. Here we propose a scheme to realize the all-electrical manipulation of spin-polarized current in an altermagnetic bilayer. Such a bilayer system can host layer-spin locking, in which one layer hosts a spin-polarized current while the other layer hosts a current with opposite spin polarization.

View Article and Find Full Text PDF

Amyloid-like DNA bridging: a new mode of DNA shaping.

Nucleic Acids Res

February 2025

Laboratoire Léon Brillouin LLB, UMR12 CEA CNRS, CEA Saclay, 91191 Gif-sur-Yvette, France.

All organisms depend on specific proteins to compact and organize their genomes. In eukaryotes, histones fulfil this role, while bacterial chromosomes are shaped by nucleoid-associated proteins (NAPs). Among its pleiotropic functions, the NAP Hfq plays a pivotal role in bacterial genome organization.

View Article and Find Full Text PDF

Anisotropic Resonant Tunneling in Twist-Stacked van der Waals Heterostructure.

ACS Nano

March 2025

Centre for Quantum Physics, Key Laboratory of Advanced Optoelectronic Quantum Architecture and Measurement (MOE), School of Physics, Beijing Institute of Technology, Beijing 100081, China.

Resonant tunneling, with energy and momentum conservation, has been extensively studied in two-dimensional van der Waals heterostructures and has potential applications in band structure probing, multivalued logic, and oscillators. Lattice alignment is crucial in resonant tunneling transistors (RTTs) for achieving negative differential resistance (NDR) with a high peak-to-valley ratio (PVR) because twist-angle-induced momentum mismatch can break the resonant tunneling condition. Here, we report anisotropic resonant tunneling in twist-stacked ReSe/-BN/ReSe RTTs, where the PVR exhibits a strong dependence on the twist angle between the two ReSe layers, reaching a maximum at the twist angle of 102°.

View Article and Find Full Text PDF

Biofilters are widely used for nitrogen removal in wastewater treatment. This study developed a bidirectional alternating-influent biofilter to reduce clogging and enhance nitrogen removal. Alternating influent utilized biofilm on the media as a denitrification carbon source.

View Article and Find Full Text PDF

Microplastics removal from stormwater runoff by bioretention cells: A review.

J Environ Sci (China)

August 2025

Key Lab of Northwest Water Resource, Environment and Ecology, MOE, Xi'an University of Architecture and Technology, Xi'an 710055, China; School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China.

Microplastics (MPs), as a new category of environmental pollutant, have been the hotspot of eco-friendly issues nowadays. Studies based on the aging process, the migration pattern of MPs in runoff rainwater, and the use of bioretention cells to remove MPs from runoff rainwater are beginning to attract widespread attention. This review analyses the migration patterns of MPs in rainwater runoff through their sources, structure and characteristics.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!