We developed a reusable and open-source machine learning (ML) pipeline that can provide an analytical framework for rigorous biomarker discovery. We implemented the ML pipeline to determine the predictive potential of clinical and immunoproteome antibody data for outcomes associated with Chlamydia trachomatis () infection collected from 222 cis-gender females with high exposure. We compared the predictive performance of 4 ML algorithms (naive Bayes, random forest, extreme gradient boosting with linear booster [xgbLinear], and -nearest neighbors [KNN]), screened from 215 ML methods, in combination with two different feature selection strategies, Boruta and recursive feature elimination.
View Article and Find Full Text PDF