Hurdles to Artificial Intelligence Deployment: Noise in Schemas and "Gold" Labels.

Radiol Artif Intell

Institute for Better Health, Trillium Health Partners, Mississauga, Ontario, Canada (M.A., B.F.); and Centre for Information Technology, Department of Computer Science (M.A.), and Department of Medical Imaging (B.F.), University of Toronto, 40 St George St, Room 4283, Toronto, ON, Canada M5S 2E4.

Published: March 2023

Despite frequent reports of imaging artificial intelligence (AI) that parallels human performance, clinicians often question the safety and robustness of AI products in practice. This work explores two underreported sources of noise that negatively affect imaging AI: variation in labeling schema definitions and noise in the labeling process. First, the overlap between the schemas of two publicly available datasets and a third-party vendor are compared, showing there is low agreement (<50%) between them. The authors also highlight the problem of label inconsistency, where different annotation schemas are selected for the same clinical prediction task; this results in inconsistent use of medical ontologies through intermingling or duplicate observations and diseases. Second, the individual radiologist annotations for the CheXpert test set are used to quantify noise in the labeling process. The analysis demonstrated that label noise varies by class, as agreement was high for pneumothorax and medical devices (percent agreement > 90%). Among low agreement classes (pneumonia, consolidation), the labels assigned as "ground truth" were unreliable, suggesting that the result of majority voting is highly dependent on which group of radiologists is assigned to annotation. Noise in labeling schemas and gold label annotations are pervasive in medical imaging classification and affect downstream clinical deployment. Possible solutions (eg, changes to task design, annotation methods, and model training) and their potential to improve trust in clinical AI are discussed. Radiology AI, Dataset Creation, Noise in Datasets © RSNA, 2023 See also the commentary by Ursprung and Woitek in this issue.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10077093PMC
http://dx.doi.org/10.1148/ryai.220056DOI Listing

Publication Analysis

Top Keywords

artificial intelligence
8
noise labeling
8
low agreement
8
noise
5
hurdles artificial
4
intelligence deployment
4
deployment noise
4
noise schemas
4
schemas "gold"
4
"gold" labels
4

Similar Publications

Importance: Trials have not demonstrated superiority of alteplase or tenecteplase vs standard care in patients with mild stroke and have raised safety concerns. Prourokinase is an alternative fibrinolytic that may have a favorable safety profile, and the benefit-risk profile of prourokinase in mild stroke is unknown.

Objective: To investigate the efficacy and safety of prourokinase in mild ischemic stroke within 4.

View Article and Find Full Text PDF

Context.—: Generative artificial intelligence (AI) has emerged as a transformative force in various fields, including anatomic pathology, where it offers the potential to significantly enhance diagnostic accuracy, workflow efficiency, and research capabilities.

Objective.

View Article and Find Full Text PDF

Endohedral boron-doped scandium clusters BSc ( = 2-3, = 3-13): triangular - linear rearrangement of the B dopant.

Dalton Trans

January 2025

Laboratory for Chemical Computation and Modeling, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh City, Vietnam.

A theoretical investigation, employing density functional theory with the PBE functional and the Def2-TZVP basis set, comprehensively explores the geometric and electronic structures and properties of the boron doped scandium clusters BSc with = 2-3 and = 3-13. Introduction of B atoms significantly enhances the stability of the resulting clusters with respect to the initial counterparts. As the number of B atoms increases, the stability of the doped clusters improves, following the order: BSc > BSc > BSc > Sc.

View Article and Find Full Text PDF

Purpose: Identifying muscles linked to postoperative physical function can guide protocols to enhance early recovery following total hip arthroplasty (THA). This study aimed to evaluate the association of preoperative pelvic and thigh muscle volume and quality with early physical function after THA in patients with unilateral hip osteoarthritis (HOA).

Methods: Preoperative Computed tomography (CT) images of 61 patients (eight males and 53 females) with HOA were analyzed.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!