We examined pretraining tasks leveraging abundant labeled data to effectively enhance molecular representation learning in downstream tasks, specifically emphasizing graph transformers to improve the prediction of ADMET properties. Our investigation revealed limitations in previous pretraining tasks and identified more meaningful training targets, ranging from 2D molecular descriptors to extensive quantum chemistry simulations. These data were seamlessly integrated into supervised pretraining tasks. The implementation of our pretraining strategy and multitask learning outperforms conventional methods, achieving state-of-the-art outcomes in 7 out of 22 ADMET tasks within the Therapeutics Data Commons by utilizing a shared encoder across all tasks. Our approach underscores the effectiveness of learning molecular representations and highlights the potential for scalability when leveraging extensive data sets, marking a significant advancement in this domain.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/acs.jcim.4c00772 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!