Liquid chromatography-mass spectrometry (LC-MS) could provide a large amount of information to assist in metabolites identification. Different liquid chromatographic methods (CMs) could produce different retention times to the same metabolite. To predict the retention time of local dataset by online datasets has become a trend, but the datasets downloaded from different databases were differences in quantity levels. And the imbalanced data could produce bad influence in model prediction. Thus, based on quantitative structure-retention relationships (QSRRs), an ensemble model, named RT-Ensemble Pred, has been successfully built to predict retention time of different LC-MS systems in this study. A total of 76, 807 metabolites (76, 909 retention times) have been collected across 9 CMs, and 19 natural products and 1 antifungal drug (20 retention times) have been collected to test the model applicability. An ensemble sampling was applied for the preprocessing procedure to solve the problem of imbalanced data. Based on the ensemble sampling, RT-Ensemble Pred could better utilize online datasets for the prediction of retention time. RT-Ensemble Pred was built based on the online datasets and tested by local dataset. The predictive accuracy of RT-Ensemble Pred was higher than the models without any sampling methods. The results showed that RT-Ensemble Pred could predict the metabolites which was not included in the database and the metabolites which were from new CMs. It could also be used for the prediction of other compounds beside metabolites. Furthermore, a tool of RT-Ensemble Pred was packed and can be freely downloaded at https://gitlab.com/mikic93/rt-ensemble-pred. It provides convenience for the users who need to predict the retention time of metabolites.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.chroma.2023.464304 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!