A fundamental goal of evaluating the performance of a clinical model is to ensure it performs well across a diverse intended patient population. A primary challenge is that the data used in model development and testing often consist of many overlapping, heterogeneous patient subgroups that may not be explicitly defined or labeled. While a model's average performance on a dataset may be high, the model can have significantly lower performance for certain subgroups, which may be hard to detect.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
November 2024
The future of artificial intelligence (AI) safety is expected to include bias mitigation methods from development to application. The complexity and integration of these methods could grow in conjunction with advances in AI and human-AI interactions. Numerous methods are being proposed to mitigate bias, but without a structured way to compare their strengths and weaknesses.
View Article and Find Full Text PDFIn the past decade, artificial intelligence (AI) algorithms have made promising impacts in many areas of healthcare. One application is AI-enabled prioritization software known as computer-aided triage and notification (CADt). This type of software as a medical device is intended to prioritize reviews of radiological images with time-sensitive findings, thus shortening the waiting time for patients with these findings.
View Article and Find Full Text PDFMachine learning (ML) models often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices as data drift may lead to unexpected performance. This work introduces a new framework for out of distribution (OOD) detection and data drift monitoring that combines ML and geometric methods with statistical process control (SPC).
View Article and Find Full Text PDFPurpose: Synthetic datasets hold the potential to offer cost-effective alternatives to clinical data, ensuring privacy protections and potentially addressing biases in clinical data. We present a method leveraging such datasets to train a machine learning algorithm applied as part of a computer-aided detection (CADe) system.
Approach: Our proposed approach utilizes clinically acquired computed tomography (CT) scans of a physical anthropomorphic phantom into which manufactured lesions were inserted to train a machine learning algorithm.
Background: Positive and negative likelihood ratios (PLR and NLR) are important metrics of accuracy for diagnostic devices with a binary output. However, the properties of Bayesian and frequentist interval estimators of PLR/NLR have not been extensively studied and compared. In this study, we explore the potential use of the Bayesian method for interval estimation of PLR/NLR, and, more broadly, for interval estimation of the ratio of two independent proportions.
View Article and Find Full Text PDFThe National Institutes of Health-US Food and Drug Administration Joint Leadership Council Next-Generation Sequencing and Radiomics Working Group was formed by the National Institutes of Health-Food and Drug Administration Joint Leadership Council to promote the development and validation of innovative next-generation sequencing tests, radiomic tools, and associated data analysis and interpretation enhanced by artificial intelligence and machine learning technologies. A 2-day workshop was held on September 29-30, 2021, to convene members of the scientific community to discuss how to overcome the "ground truth" gap that has frequently been acknowledged as 1 of the limiting factors impeding high-quality research, development, validation, and regulatory science in these fields. This report provides a summary of the resource gaps identified by the working group and attendees, highlights existing resources and the ways they can potentially be employed to accelerate growth in these fields, and presents opportunities to support next-generation sequencing and radiomic tool development and validation using technologies such as artificial intelligence and machine learning.
View Article and Find Full Text PDFPurpose: The Medical Imaging and Data Resource Center (MIDRC) was created to facilitate medical imaging machine learning (ML) research for tasks including early detection, diagnosis, prognosis, and assessment of treatment response related to the coronavirus disease 2019 pandemic and beyond. The purpose of this work was to create a publicly available metrology resource to assist researchers in evaluating the performance of their medical image analysis ML algorithms.
Approach: An interactive decision tree, called MIDRC-MetricTree, has been developed, organized by the type of task that the ML algorithm was trained to perform.
The adoption of artificial intelligence (AI) tools in medicine poses challenges to existing clinical workflows. This commentary discusses the necessity of context-specific quality assurance (QA), emphasizing the need for robust QA measures with quality control (QC) procedures that encompass (1) acceptance testing (AT) before clinical use, (2) continuous QC monitoring, and (3) adequate user training. The discussion also covers essential components of AT and QA, illustrated with real-world examples.
View Article and Find Full Text PDFPurpose: Understanding an artificial intelligence (AI) model's ability to generalize to its target population is critical to ensuring the safe and effective usage of AI in medical devices. A traditional generalizability assessment relies on the availability of large, diverse datasets, which are difficult to obtain in many medical imaging applications. We present an approach for enhanced generalizability assessment by examining the decision space beyond the available testing data distribution.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
July 2023
Labeled ECG data in diseased state are, however, relatively scarce due to various concerns including patient privacy and low prevalence. We propose the first study in its kind that synthesizes atrial fibrillation (AF)-like ECG signals from normal ECG signals using the AFE-GAN, a generative adversarial network. Our AFE-GAN adjusts both beat morphology and rhythm variability when generating the atrial fibrillation-like ECG signals.
View Article and Find Full Text PDFPurpose: The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional effort to accelerate medical imaging machine intelligence research and create a publicly available image repository/commons as well as a sequestered commons for performance evaluation and benchmarking of algorithms. After de-identification, approximately 80% of the medical images and associated metadata become part of the open commons and 20% are sequestered from the open commons. To ensure that both commons are representative of the population available, we introduced a stratified sampling method to balance the demographic characteristics across the two datasets.
View Article and Find Full Text PDFPurpose: The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary MIDRC dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC).
Approach: The Jensen-Shannon distance (JSD), a measure of similarity of two distributions, was used to longitudinally measure the representativeness of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC.
J Med Imaging (Bellingham)
September 2023
Purpose: To introduce developers to medical device regulatory processes and data considerations in artificial intelligence and machine learning (AI/ML) device submissions and to discuss ongoing AI/ML-related regulatory challenges and activities.
Approach: AI/ML technologies are being used in an increasing number of medical imaging devices, and the fast evolution of these technologies presents novel regulatory challenges. We provide AI/ML developers with an introduction to U.
J Med Imaging (Bellingham)
November 2023
Purpose: To recognize and address various sources of bias essential for algorithmic fairness and trustworthiness and to contribute to a just and equitable deployment of AI in medical imaging, there is an increasing interest in developing medical imaging-based machine learning methods, also known as medical imaging artificial intelligence (AI), for the detection, diagnosis, prognosis, and risk assessment of disease with the goal of clinical implementation. These tools are intended to help improve traditional human decision-making in medical imaging. However, biases introduced in the steps toward clinical deployment may impede their intended function, potentially exacerbating inequities.
View Article and Find Full Text PDFData drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with an emphasis on medical imaging.
View Article and Find Full Text PDFPurpose: Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm.
View Article and Find Full Text PDFRapid advances in artificial intelligence (AI) and machine learning, and specifically in deep learning (DL) techniques, have enabled broad application of these methods in health care. The promise of the DL approach has spurred further interest in computer-aided diagnosis (CAD) development and applications using both "traditional" machine learning methods and newer DL-based methods. We use the term CAD-AI to refer to this expanded clinical decision support environment that uses traditional and DL-based AI methods.
View Article and Find Full Text PDFObjective: After deploying a clinical prediction model, subsequently collected data can be used to fine-tune its predictions and adapt to temporal shifts. Because model updating carries risks of over-updating/fitting, we study online methods with performance guarantees.
Materials And Methods: We introduce 2 procedures for continual recalibration or revision of an underlying prediction model: Bayesian logistic regression (BLR) and a Markov variant that explicitly models distribution shifts (MarBLR).
The Abstract is intended to provide a concise summary of the study and its scientific findings. For AI/ML applications in medical physics, a problem statement and rationale for utilizing these algorithms are necessary while highlighting the novelty of the approach. A brief numerical description of how the data are partitioned into subsets for training of the AI/ML algorithm, validation (including tuning of parameters), and independent testing of algorithm performance is required.
View Article and Find Full Text PDF: The breast pathology quantitative biomarkers (BreastPathQ) challenge was a grand challenge organized jointly by the International Society for Optics and Photonics (SPIE), the American Association of Physicists in Medicine (AAPM), the U.S. National Cancer Institute (NCI), and the U.
View Article and Find Full Text PDFPurpose: Most state-of-the-art automated medical image analysis methods for volumetric data rely on adaptations of two-dimensional (2D) and three-dimensional (3D) convolutional neural networks (CNNs). In this paper, we develop a novel unified CNN-based model that combines the benefits of 2D and 3D networks for analyzing volumetric medical images.
Methods: In our proposed framework, multiscale contextual information is first extracted from 2D slices inside a volume of interest (VOI).