Self-Supervised Molecular Pretraining Strategy for Low-Resource Reaction Prediction Scenarios.

J Chem Inf Model

Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China.

Published: October 2022

In the face of low-resource reaction training samples, we construct a chemical platform for addressing small-scale reaction prediction problems. Using a self-supervised pretraining strategy called MAsked Sequence to Sequence (MASS), the Transformer model can absorb the chemical information of about 1 billion molecules and then fine-tune on a small-scale reaction prediction. To further strengthen the predictive performance of our model, we combine MASS with the reaction transfer learning strategy. Here, we show that the average improved accuracies of the Transformer model can reach 14.07, 24.26, 40.31, and 57.69% in predicting the Baeyer-Villiger, Heck, C-C bond formation, and functional group interconversion reaction data sets, respectively, marking an important step to low-resource reaction prediction.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.2c00588DOI Listing

Publication Analysis

Top Keywords

reaction prediction
16
low-resource reaction
12
pretraining strategy
8
small-scale reaction
8
transformer model
8
reaction
7
self-supervised molecular
4
molecular pretraining
4
strategy low-resource
4
prediction
4

Similar Publications

Background: Over the past five years, the pregnancy rate in assisted reproductive technology (ART) programs in Russia has remained relatively stable. The aim of this study was to assess the distribution of monocyte and macrophage subsets in the blood and follicular fluid of infertile women undergoing assisted reproductive technology.

Methods: The study involved 45 women with a mean age of 35 ± 4.

View Article and Find Full Text PDF

This study investigates the relationship between SARS-CoV-2 RT-PCR cycle threshold (Ct) values and key COVID-19 transmission and outcome metrics across five years of the pandemic in Jalisco, Mexico. Utilizing a comprehensive time-series analysis, we evaluated weekly median Ct values as proxies for viral load and their temporal associations with positivity rates, reproduction numbers (Rt), hospitalizations, and mortality. Cross-correlation and lagged regression analyses revealed significant lead-lag relationships, with declining Ct values consistently preceding surges in positivity rates and hospitalizations, particularly during the early phases of the pandemic.

View Article and Find Full Text PDF

: Since 2008, following clinical studies conducted on children that revealed the ability of the β-adrenergic antagonist propranolol to inhibit capillary growth in infantile hemangiomas (IHs), its oral administration has become the first-line treatment for IHs. Although oral propranolol therapy at a dosage of 3 mg/kg/die is effective, it can cause systemic adverse reactions. This therapy is not necessarily applicable to all patients.

View Article and Find Full Text PDF

Species in Dromedary Camels () and Ruminants from Somalia.

Pathogens

January 2025

Vector-Borne Diseases Laboratory, Department of Veterinary Medicine, Universidade Federal do Paraná, Curitiba 80035-050, Brazil.

Ehrlichioses, caused by species, are tick-borne diseases (TBDs) that affect animals and humans worldwide. This study aimed to investigate the molecular occurrence of spp. in 530 animals (155 Dromedary camels, 199 goats, 131 cattle, and 45 sheep) in the Benadir and Lower Shabelle regions of Somalia.

View Article and Find Full Text PDF

Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach.

Molecules

January 2025

GSK Carbon Neutral Laboratories for Sustainable Chemistry, Jubilee Campus, University of Nottingham, Triumph Road, Nottingham NG7 2TU, UK.

The range of chemical databases available has dramatically increased in recent years, but the reliability and quality of their data are often negatively affected by human-error fidelity. The size of chemical databases can make manual data curation/checking of such sets time consuming; thus, automated tools to help this process are highly desirable. Herein, we propose the use of Graph Neural Networks (GNNs) to identifying potential stereochemical misassignments in the primary asymmetric catalysis literature.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!