Benchmarking AutoML frameworks for disease prediction using medical claims.

BioData Min

Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, 90069, CA, USA.

Published: July 2022

Objectives: Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets.

Materials And Methods: We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.

Results: The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications.

Discussion: Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance.

Conclusion: Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9327416PMC
http://dx.doi.org/10.1186/s13040-022-00300-2DOI Listing

Publication Analysis

Top Keywords

automl tools
16
medical claims
8
automl
5
benchmarking automl
4
automl frameworks
4
frameworks disease
4
disease prediction
4
prediction medical
4
claims objectives
4
objectives ascertain
4

Similar Publications

Background: Artificial intelligence (AI) models are emerging as promising tools to identify predictive features among data coming from health records. Their application in clinical routine is still challenging, due to technical limits and to explainability issues in this specific setting. Response to standard first-line immunotherapy (ICI) in metastatic Non-Small-Cell Lung Cancer (NSCLC) is an interesting population for machine learning (ML), since up to 30% of patients do not benefit.

View Article and Find Full Text PDF
Article Synopsis
  • The introduction of Machine Learning technologies has transformed computational chemistry, but challenges like algorithm selection and data pre-processing remain.
  • DeepMol addresses these issues as a pioneering AutoML tool that automates crucial steps in the ML pipeline, effectively optimizing methods for predicting molecular properties.
  • With competitive performance on benchmark datasets and robust features such as open-source code, comprehensive documentation, and support for various models, DeepMol establishes itself as a leading tool in the computational chemistry domain.
View Article and Find Full Text PDF

Machine Learning (ML) techniques require novel computer programming skills along with clinical domain knowledge to produce a useful model. We demonstrate the use of a cloud-based ML tool that does not require any programming expertise to develop, validate and deploy a prognostic model for Intracerebral Haemorrhage (ICH). The data of patients admitted with Spontaneous Intracerebral haemorrhage from January 2015 to December 2019 was accessed from our prospectively maintained hospital stroke registry.

View Article and Find Full Text PDF
Article Synopsis
  • The aryl hydrocarbon receptor (AhR) is significant in immune and metabolic processes, but its complexity and diverse ligands make drug discovery challenging.
  • Researchers created quantitative structure-activity relationship (QSAR) models by analyzing 978 molecules to improve predictions of AhR activity.
  • The best classification model achieved 76% accuracy and led to the development of a user-friendly web application, potentially paving the way for further studies on how ligand structure affects AhR modulation.
View Article and Find Full Text PDF

Automated Machine Learning Tools to Build Regression Models for Schizosaccharomyces pombe Omics Data.

Methods Mol Biol

November 2024

Department of Microbiology, Universidade Federal de Viçosa, Viçosa, Brazil.

Article Synopsis
  • - Machine learning is essential for analyzing large biological datasets and improving predictions, especially given the recent increase in biological data from high-throughput technologies.
  • - The study highlights the necessity for effective modeling approaches to interpret complex molecular systems better.
  • - The authors demonstrate how to use automated machine learning (AutoML) to create a model for predicting protein abundance in Schizosaccharomyces pombe by analyzing data from codon usage bias and quantitative proteomics.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!