We describe the application of ensemble methods to binary classification problems on two pharmaceutical compound data sets. Several variants of single and ensembles models of k-nearest neighbors classifiers, support vector machines (SVMs), and single ridge regression models are compared. All methods exhibit robust classification even when more features are given than observations. On two data sets dealing with specific properties of drug-like substances (cytochrome P450 inhibition and "Frequent Hitters", i.e., unspecific protein inhibition), we achieve classification rates above 90%. We are able to reduce the cross-validated misclassification rate for the Frequent Hitters problem by a factor of 2 compared to previous results obtained for the same data set with different modeling techniques.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/ci049850e | DOI Listing |
JMIR Med Inform
January 2025
Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, United States.
Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.
View Article and Find Full Text PDFBMC Complement Med Ther
January 2025
Department of Traditional Chinese Medicine, Shenzhen Maternity and Child Healthcare Hospital, Southern Medical University, Shenzhen, P.R. China.
Introduction: Anzi Tiaochong Fang (ATF) is a traditional Chinese medicine (TCM) Fangji widely used to treat antiphospholipid syndrome-related recurrent pregnancy loss (APS-RPL). This study aimed to identify the quality markers and elucidate the mechanisms of ATF in treating APS-RPL.
Methods: Chemical, network pharmacology, and in vitro verification were employed to identify quality markers and mechanisms of ATF.
BMC Cancer
January 2025
Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
Background: Identifying high risk factors and predicting lung cancer incidence risk are essential to prevention and intervention of lung cancer for the elderly. We aim to develop lung cancer incidence risk prediction model in the elderly to facilitate early intervention and prevention of lung cancer.
Methods: We stratified the population into six subgroups according to age and gender.
Introduction: Activation of the inflammatory response system is involved in the pathogenesis of generalized anxiety disorder (GAD). The purpose of this study was to identify and characterize inflammatory biomarkers in the diagnosis of GAD based on machine learning algorithms.
Methods: The evaluation of peripheral immune parameters and lymphocyte subsets was performed on patients with GAD.
Protein Sci
February 2025
Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem University, Atasehir, Istanbul, Turkey.
Protein structure holds immense potential for pathogenicity prediction, albeit structure-based predictors are limited compared to the sequence-based counterparts due to the "structure knowledge gap" between large number of available protein sequences and relatively limited number of structures. Leveraging the highly accurate protein structures predicted by AlphaFold2 (AF2), we introduce AFFIPred, an ensemble machine learning classifier that combines sequence and AF2-based structural characteristics to predict missense variant pathogenicity. Based on the assessments on unseen datasets, AFFIPred reached a comparable level of performance with the state-of-the-art predictors such as AlphaMissense.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!