GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

Comput Biol Med

Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China. Electronic address:

Published: March 2024

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2024.108073DOI Listing

Publication Analysis

Top Keywords

large language
12
git-mol multi-modal
8
multi-modal large
8
language model
8
molecular science
8
graph image
8
image text
8
language models
8
language
5
molecular
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!