Objectives: Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets.
Materials And Methods: We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.
Results: The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications.
Discussion: Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance.
Conclusion: Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9327416 | PMC |
http://dx.doi.org/10.1186/s13040-022-00300-2 | DOI Listing |
Lung Cancer
December 2024
Università Vita-Salute San Raffaele, Milan, Italy; Department of Medical Oncology, IRCCS Ospedale San Raffaele, Milan, Italy.
Background: Artificial intelligence (AI) models are emerging as promising tools to identify predictive features among data coming from health records. Their application in clinical routine is still challenging, due to technical limits and to explainability issues in this specific setting. Response to standard first-line immunotherapy (ICI) in metastatic Non-Small-Cell Lung Cancer (NSCLC) is an interesting population for machine learning (ML), since up to 30% of patients do not benefit.
View Article and Find Full Text PDFJ Cheminform
December 2024
CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal.
Neurosurg Rev
December 2024
Neurosurgery, Kasturba Medical College, Manipal Academy of Higher Education, Manipal, 576104, India.
Machine Learning (ML) techniques require novel computer programming skills along with clinical domain knowledge to produce a useful model. We demonstrate the use of a cloud-based ML tool that does not require any programming expertise to develop, validate and deploy a prognostic model for Intracerebral Haemorrhage (ICH). The data of patients admitted with Spontaneous Intracerebral haemorrhage from January 2015 to December 2019 was accessed from our prospectively maintained hospital stroke registry.
View Article and Find Full Text PDFPharmaceutics
November 2024
Department of Pharmaceutical Sciences, University of Perugia, via del Liceo 1, 06123 Perugia, Italy.
Methods Mol Biol
November 2024
Department of Microbiology, Universidade Federal de Viçosa, Viçosa, Brazil.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!