In the past decade, artificial intelligence (AI) algorithms have made promising impacts in many areas of healthcare. One application is AI-enabled prioritization software known as computer-aided triage and notification (CADt). This type of software as a medical device is intended to prioritize reviews of radiological images with time-sensitive findings, thus shortening the waiting time for patients with these findings.
View Article and Find Full Text PDFBackground: This study reports the results of a set of discrimination experiments using simulated images that represent the appearance of subtle lesions in low-dose computed tomography (CT) of the lungs. Noise in these images has a characteristic ramp-spectrum before apodization by noise control filters. We consider three specific diagnostic features that determine whether a lesion is considered malignant or benign, two system-resolution levels, and four apodization levels for a total of 24 experimental conditions.
View Article and Find Full Text PDFMedical image interpretation is central to detecting, diagnosing, and staging cancer and many other disorders. At a time when medical imaging is being transformed by digital technologies and artificial intelligence, understanding the basic perceptual and cognitive processes underlying medical image interpretation is vital for increasing diagnosticians' accuracy and performance, improving patient outcomes, and reducing diagnostician burnout. Medical image perception remains substantially understudied.
View Article and Find Full Text PDFWe investigate a series of two-alternative forced-choice (2AFC) discrimination tasks based on malignant features of abnormalities in low-dose lung CT scans. A total of 3 tasks are evaluated, and these consist of a size-discrimination task, a boundary-sharpness task, and an irregular-interior task. Target and alternative signal profiles for these tasks are modulated by one of two system transfer functions and embedded in ramp-spectrum noise that has been apodized for noise control in one of 4 different ways.
View Article and Find Full Text PDFA recent study reported on an imaging trial that evaluated the performance of digital breast tomosynthesis (DBT) as a replacement for full-field digital mammography (FFDM) for breast cancer screening. In this trial, the whole imaging chain was simulated, including the breast phantom generation, the x-ray transport process, and computational readers for image interpretation. We focus on the design and performance characteristics of the computational reader in the above-mentioned trial.
View Article and Find Full Text PDFStat Methods Med Res
June 2020
Evaluation of medical imaging devices often involves clinical studies where multiple readers (MR) read images of multiple cases (MC) for a clinical task, which are often called MRMC studies. In addition to sizing patient cases as is required in most clinical trials, MRMC studies also require sizing readers, since both readers and cases contribute to the uncertainty of the estimated diagnostic performance, which is often measured by the area under the ROC curve (AUC). Due to limited prior information, sizing of such a study is often unreliable.
View Article and Find Full Text PDFJ Med Imaging (Bellingham)
January 2019
We investigated effects of prevalence and case distribution on radiologist diagnostic performance as measured by area under the receiver operating characteristic curve (AUC) and sensitivity-specificity in lab-based reader studies evaluating imaging devices. Our retrospective reader studies compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with dense breasts. Mammograms were acquired from the prospective Digital Mammographic Imaging Screening Trial.
View Article and Find Full Text PDFImportance: Expensive and lengthy clinical trials can delay regulatory evaluation of innovative technologies, affecting patient access to high-quality medical products. Simulation is increasingly being used in product development but rarely in regulatory applications.
Objectives: To conduct a computer-simulated imaging trial evaluating digital breast tomosynthesis (DBT) as a replacement for digital mammography (DM) and to compare the results with a comparative clinical trial.
Purpose: The task-based assessment of image quality using model observers is increasingly used for the assessment of different imaging modalities. However, the performance computation of model observers needs standardization as well as a well-established trust in its implementation methodology and uncertainty estimation. The purpose of this work was to determine the degree of equivalence of the channelized Hotelling observer performance and uncertainty estimation using an intercomparison exercise.
View Article and Find Full Text PDFPurpose: This study investigates forced localization of targets in simulated images with statistical properties similar to trans-axial sections of x-ray computed tomography (CT) volumes. A total of 24 imaging conditions are considered, comprising two target sizes, three levels of background variability, and four levels of frequency apodization. The goal of the study is to better understand how human observers perform forced-localization tasks in images with CT-like statistical properties.
View Article and Find Full Text PDFRationale And Objectives: In this paper we examine which comparisons of reading performance between diagnostic imaging systems made in controlled retrospective laboratory studies may be representative of what we observe in later clinical studies. The change in a meaningful diagnostic figure of merit between two diagnostic modalities should be qualitatively or quantitatively comparable across all kinds of studies.
Materials And Methods: In this meta-study we examine the reproducibility of relative measures of sensitivity, false positive fraction (FPF), area under the receiver operating characteristic (ROC) curve, and expected utility across laboratory and observational clinical studies for several different breast imaging modalities, including screen film mammography, digital mammography, breast tomosynthesis, and ultrasound.
Int J Biostat
November 2016
Schatzkin et al. and other authors demonstrated that the ratios of some conditional statistics such as the true positive fraction are equal to the ratios of unconditional statistics, such as disease detection rates, and therefore we can calculate these ratios between two screening tests on the same population even if negative test patients are not followed with a reference procedure and the true and false negative rates are unknown. We demonstrate that this same property applies to an expected utility metric.
View Article and Find Full Text PDFScores produced by statistical classifiers in many clinical decision support systems and other medical diagnostic devices are generally on an arbitrary scale, so the clinical meaning of these scores is unclear. Calibration of classifier scores to a meaningful scale such as the probability of disease is potentially useful when such scores are used by a physician. In this work, we investigated three methods (parametric, semi-parametric, and non-parametric) for calibrating classifier scores to the probability of disease scale and developed uncertainty estimation techniques for these methods.
View Article and Find Full Text PDFProc SPIE Int Soc Opt Eng
February 2016
Breast cancer risk prediction algorithms are used to identify subpopulations that are at increased risk for developing breast cancer. They can be based on many different sources of data such as demographics, relatives with cancer, gene expression, and various phenotypic features such as breast density. Women who are identified as high risk may undergo a more extensive (and expensive) screening process that includes MRI or ultrasound imaging in addition to the standard full-field digital mammography (FFDM) exam.
View Article and Find Full Text PDFJ Med Imaging (Bellingham)
October 2014
The evaluation of medical imaging devices often involves studies that measure the ability of observers to perform a signal detection task on images obtained from those devices. Data from such studies are frequently regressed ordinally using two-sample receiver operating characteristic (ROC) models. We applied some of these models to a number of randomly chosen data sets from medical imaging and evaluated how well they fit using the Akaike and Bayesian information criteria and cross-validation.
View Article and Find Full Text PDFJ Opt Soc Am A Opt Image Sci Vis
November 2014
There is a lack of consensus in measuring observer performance in search tasks. To pursue a consensus, we set our goal to obtain metrics that are practical, meaningful, and predictive. We consider a metric practical if it can be implemented to measure human and computer observers' performance.
View Article and Find Full Text PDFPurpose: Variance estimates for detector energy resolution metrics can be used as stopping criteria in Monte Carlo simulations for the purpose of ensuring a small uncertainty of those metrics and for the design of variance reduction techniques.
Methods: The authors derive an estimate for the variance of two energy resolution metrics, the Swank factor and the Fano factor, in terms of statistical moments that can be accumulated without significant computational overhead. The authors examine the accuracy of these two estimators and demonstrate how the estimates of the coefficient of variation of the Swank and Fano factors behave with data from a Monte Carlo simulation of an indirect x-ray imaging detector.
Rationale And Objectives: Our objective is to determine whether expected utility (EU) and the area under the receiver operator characteristic (AUC) are consistent with one another as endpoints of observer performance studies in mammography. These two measures characterize receiver operator characteristic performance somewhat differently. We compare these two study endpoints at the level of individual reader effects, statistical inference, and components of variance across readers and cases.
View Article and Find Full Text PDFComputer-aided detection and diagnosis (CAD) systems are increasingly being used as an aid by clinicians for detection and interpretation of diseases. Computer-aided detection systems mark regions of an image that may reveal specific abnormalities and are used to alert clinicians to these regions during image interpretation. Computer-aided diagnosis systems provide an assessment of a disease using image-based information alone or in combination with other relevant diagnostic data and are used by clinicians as a decision support in developing their diagnoses.
View Article and Find Full Text PDFBackground: The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al.
View Article and Find Full Text PDFRationale And Objectives: Before using a new diagnostic imaging device regularly in a clinic, it should be studied using patients and radiologists. Often such studies report diagnostic performance in terms of sensitivity, specificity, area under the receiver operating characteristic curve (AUC), or differences thereof. In this report we look at how these studies differ from actual future clinical practice and how those differences may affect reported performance measures.
View Article and Find Full Text PDFRationale And Objectives: The purpose of this investigation is to compare the statistical power of the most common measure of performance for observer performance studies, area under the ROC curve (AUC), to an expected utility (EU) endpoint.
Materials And Methods: We have modified a well-known simulation procedure developed by Roe and Metz for statistical power analysis in receiver operating characteristic (ROC) studies. Starting from a set of baseline simulations, we investigate the effects of three parameters that describe properties of the observers (iso-utility slope, unequal variance, and tendency to favor more aggressive or conservative actions) and three parameters that affect experimental design (number of readers, number of cases, and fraction of positive cases).
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods.
View Article and Find Full Text PDF