Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection.

Conor K Corbin Michael Baiocchi Jonathan H Chen

AMIA Jt Summits Transl Sci Proc

Center for Biomedical Informatics Research, Stanford, California, USA.

Published: June 2023

When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283136	PMC

Publication Analysis

Top Keywords

machine learning

model performance

label selection

learning models

naive estimates

clinical machine

learning model

performance estimates

selection

deployment population

Similar Publications

Role of data-driven regional growth model in shaping brain folding patterns.

Soft Matter

January 2025

School of Environmental, Civil, Agricultural and Mechanical Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA.

Jixin Hou Zhengwang Wu Xianyan Chen Li Wang Dajiang Zhu

The surface morphology of the developing mammalian brain is crucial for understanding brain function and dysfunction. Computational modeling offers valuable insights into the underlying mechanisms for early brain folding. Recent findings indicate significant regional variations in brain tissue growth, while the role of these variations in cortical development remains unclear.

View Article and Find Full Text PDF

Similar Publications

Predicting cognitive decline from neuropsychiatric symptoms and Alzheimer's disease biomarkers: A machine learning approach to a population-based data.

J Alzheimers Dis

January 2025

Department of Neurology and the Franke Barrow Global Neuroscience Education Center, Barrow Neurological Institute, Phoenix, AZ, USA.

Jay Shah Janina Krell-Roesch Erica Forzani David S Knopman Cliff R Jack

Background: The aim of this study was to examine the potential added value of including neuropsychiatric symptoms (NPS) in machine learning (ML) models, along with demographic features and Alzheimer's disease (AD) biomarkers, to predict decline or non-decline in global and domain-specific cognitive scores among community-dwelling older adults.

Objective: To evaluate the impact of adding NPS to AD biomarkers on ML model accuracy in predicting cognitive decline among older adults.

Methods: The study was conducted in the setting of the Mayo Clinic Study of Aging, including participants aged ≥ 50 years with information on demographics (i.

View Article and Find Full Text PDF

Similar Publications

Identification of Programmed Cell Death-related Biomarkers for the Potential Diagnosis and Treatment of Osteoporosis.

Endocr Metab Immune Disord Drug Targets

January 2025

Department of Orthopaedic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing 100020, China.

Yancheng Huo Meng Guo Yihan Li Xingchen Yao Qingxian Tian

Background: Osteoporosis (OP) is a skeletal condition characterized by increased susceptibility to fractures. Programmed cell death (PCD) is the orderly process of cells ending their own life that has not been thoroughly explored in relation to OP.

Objective: This study is to investigate PCD-related genes in OP, shedding light on potential mechanisms underlying the disease.

View Article and Find Full Text PDF

Similar Publications

The impact of war on people with type 2 diabetes in Ukraine: a survey study.

EClinicalMedicine

January 2025

Medical Laboratory CSD, Kyiv 02000, Ukraine.

Oksana Sulaieva Viktoriia Yerokhovych Sergii Zemskov Iuliia Komisarenko Vitalii Gurianov

Background: Although the number of studies reporting war-induced effects on the health of the Ukrainian population has been growing, there are still little data on assessing patients with type 2 diabetes (T2D) during the war. This study aimed to evaluate the impact of war on T2D patients' health to define key risk factors promoting disease progression.

Methods: A survey covering various aspects of T2D patients' experience and glycemic control data was conducted from June 2022 to February 2024.

View Article and Find Full Text PDF

Similar Publications

Differentiating Cystic Lesions in the Sellar Region of the Brain Using Artificial Intelligence and Machine Learning for Early Diagnosis: A Prospective Review of the Novel Diagnostic Modalities.

Cureus

December 2024

Department of Technology and Clinical Trials, Advanced Research, Deerfield Beach, USA.

Kaivan Patel Harshal Sanghvi Gurnoor S Gill Ojas Agarwal Abhijit S Pandya

This paper investigates the potential of artificial intelligence (AI) and machine learning (ML) to enhance the differentiation of cystic lesions in the sellar region, such as pituitary adenomas, Rathke cleft cysts (RCCs) and craniopharyngiomas (CP), through the use of advanced neuroimaging techniques, particularly magnetic resonance imaging (MRI). The goal is to explore how AI-driven models, including convolutional neural networks (CNNs), deep learning, and ensemble methods, can overcome the limitations of traditional diagnostic approaches, providing more accurate and early differentiation of these lesions. The review incorporates findings from critical studies, such as using the Open Access Series of Imaging Studies (OASIS) dataset (Kaggle, San Francisco, USA) for MRI-based brain research, highlighting the significance of statistical rigor and automated segmentation in developing reliable AI models.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!