Publications by authors named "Ben Glocker"

Article Synopsis
  • There is a significant risk of reinforcing existing health inequalities in AI health technologies due to biases, primarily stemming from the datasets used.
  • The STANDING Together recommendations focus on transparency in health datasets and proactive evaluation of their impacts on different population groups, informed by a comprehensive research process with over 350 global contributors.
  • The 29 recommendations are divided into guidance for documenting health datasets and strategies for using them, aiming to identify and reduce algorithmic biases while promoting awareness of the inherent limitations in all datasets.
View Article and Find Full Text PDF

Background: The reporting of adverse events (AEs) relating to medical devices is a long-standing area of concern, with suboptimal reporting due to a range of factors including a failure to recognize the association of AEs with medical devices, lack of knowledge of how to report AEs, and a general culture of nonreporting. The introduction of artificial intelligence as a medical device (AIaMD) requires a robust safety monitoring environment that recognizes both generic risks of a medical device and some of the increasingly recognized risks of AIaMD (such as algorithmic bias). There is an urgent need to understand the limitations of current AE reporting systems and explore potential mechanisms for how AEs could be detected, attributed, and reported with a view to improving the early detection of safety signals.

View Article and Find Full Text PDF

Robustness of deep learning segmentation models is crucial for their safe incorporation into clinical practice. However, these models can falter when faced with distributional changes. This challenge is evident in magnetic resonance imaging (MRI) scans due to the diverse acquisition protocols across various domains, leading to differences in image characteristics such as textural appearances.

View Article and Find Full Text PDF
Article Synopsis
  • * There is a need for thorough analysis of performance errors in AI medical devices, including issues like false correlations and specific failure modes, which can harm patients; guidelines for reporting these errors are not well-defined.
  • * This systematic review will evaluate how often and severely AI errors occur in randomized controlled trials (RCTs) of AI medical devices, as well as how performance errors are investigated, focusing on subgroup outcomes and adverse events.
View Article and Find Full Text PDF

Purpose: Automated prostate disease classification on multi-parametric MRI has recently shown promising results with the use of convolutional neural networks (CNNs). The vision transformer (ViT) is a convolutional free architecture which only exploits the self-attention mechanism and has surpassed CNNs in some natural imaging classification tasks. However, these models are not very robust to textural shifts in the input space.

View Article and Find Full Text PDF

The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement was published in 2015 to provide the minimum reporting recommendations for studies developing or evaluating the performance of a prediction model. Methodological advances in the field of prediction have since included the widespread use of artificial intelligence (AI) powered by machine learning methods to develop prediction models. An update to the TRIPOD statement is thus needed.

View Article and Find Full Text PDF

Objective: To evaluate the effectiveness of a new strategy for using artificial intelligence (AI) as supporting reader for the detection of breast cancer in mammography-based double reading screening practice.

Methods: Large-scale multi-site, multi-vendor data were used to retrospectively evaluate a new paradigm of AI-supported reading. Here, the AI served as the second reader only if it agrees with the recall/no-recall decision of the first human reader.

View Article and Find Full Text PDF

Background: Breast density is an important risk factor for breast cancer complemented by a higher risk of cancers being missed during screening of dense breasts due to reduced sensitivity of mammography. Automated, deep learning-based prediction of breast density could provide subject-specific risk assessment and flag difficult cases during screening. However, there is a lack of evidence for generalisability across imaging techniques and, importantly, across race.

View Article and Find Full Text PDF

Objectives: MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation.

Methods: Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods.

View Article and Find Full Text PDF

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics.

View Article and Find Full Text PDF

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers.

View Article and Find Full Text PDF

Purpose: To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race.

Materials And Methods: This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race.

View Article and Find Full Text PDF
Article Synopsis
  • AI can help find breast cancer earlier by working alongside doctors when checking medical images.
  • In tests, using AI alongside regular reading found more cancer cases while having very few mistakes about false alarms.
  • Most cancers found with AI were small and serious, meaning using this technology could make breast cancer screenings even better.
View Article and Find Full Text PDF

Image-based prediction models for disease detection are sensitive to changes in data acquisition such as the replacement of scanner hardware or updates to the image processing software. The resulting differences in image characteristics may lead to drifts in clinically relevant performance metrics which could cause harm in clinical decision making, even for models that generalise in terms of area under the receiver-operating characteristic curve. We propose Unsupervised Prediction Alignment, a generic automatic recalibration method that requires no ground truth annotations and only limited amounts of unlabelled example images from the shifted data distribution.

View Article and Find Full Text PDF

Objectives: Whole-body magnetic resonance imaging (WB-MRI) has been demonstrated to be efficient and cost-effective for cancer staging. The study aim was to develop a machine learning (ML) algorithm to improve radiologists' sensitivity and specificity for metastasis detection and reduce reading times.

Materials And Methods: A retrospective analysis of 438 prospectively collected WB-MRI scans from multicenter Streamline studies (February 2013-September 2016) was undertaken.

View Article and Find Full Text PDF

Rationale And Objectives: To develop a method for automatic localisation of brain lesions on head CT, suitable for both population-level analysis and lesion management in a clinical setting.

Materials And Methods: Lesions were located by mapping a bespoke CT brain atlas to the patient's head CT in which lesions had been previously segmented. The atlas mapping was achieved through robust intensity-based registration enabling the calculation of per-region lesion volumes.

View Article and Find Full Text PDF

This paper presents an effective and general data augmentation framework for medical image segmentation. We adopt a computationally efficient and data-efficient gradient-based meta-learning scheme to explicitly align the distribution of training and validation data which is used as a proxy for unseen test data. We improve the current data augmentation strategies with two core designs.

View Article and Find Full Text PDF
Article Synopsis
  • Traumatic brain injury (TBI) is linked to chronic neurodegeneration, potentially due to systemic inflammation signaling the brain and activating microglia, which can lead to widespread brain damage.
  • The study, TBI-braINFLAMM, will analyze data from two major TBI research projects—CREACTIVE and BIO-AX-TBI—to assess the relationship between systemic inflammation, injury severity, and ongoing neurodegeneration.
  • Ethical approval has been obtained, and findings will be shared through peer-reviewed publications and conferences to enhance understanding and inform future research in this area.
View Article and Find Full Text PDF

Background: Double reading (DR) in screening mammography increases cancer detection and lowers recall rates, but has sustainability challenges due to workforce shortages. Artificial intelligence (AI) as an independent reader (IR) in DR may provide a cost-effective solution with the potential to improve screening performance. Evidence for AI to generalise across different patient populations, screening programmes and equipment vendors, however, is still lacking.

View Article and Find Full Text PDF
Article Synopsis
  • The study aims to analyze how healthy aging affects retinal changes using deep learning techniques, specifically focusing on the structural variations in the retina across individuals aged 40 to 75.
  • Researchers utilized a generative adversarial network (GAN) to create synthetic OCT images, allowing for the exploration of different hypothetical aging scenarios while keeping certain variables constant.
  • The findings reveal that retinal layer changes occur at specific rates per decade, highlighting the potential of the GAN model to visualize individual aging processes and enhance understanding beyond average population trends.
View Article and Find Full Text PDF

Background samples provide key contextual information for segmenting regions of interest (ROIs). However, they always cover a diverse set of structures, causing difficulties for the segmentation model to learn good decision boundaries with high sensitivity and precision. The issue concerns the highly heterogeneous nature of the background class, resulting in multi-modal distributions.

View Article and Find Full Text PDF
Article Synopsis
  • Predicting recovery outcomes after mild traumatic brain injury (mTBI) is difficult, especially since conventional MRI often shows normal results despite incomplete recovery in patients.
  • Advanced imaging techniques like diffusion MRI (dMRI) can reveal microstructural brain changes, possibly improving the accuracy of outcome predictions using machine learning models known as linear support vector classifiers (linearSVCs).
  • The study involved analyzing dMRI data from 179 mTBI patients and 85 controls, aiming to differentiate between patients with complete versus incomplete recovery, while also experimenting with a method called ComBat to standardize imaging data and enhance classification accuracy.
View Article and Find Full Text PDF

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers.

View Article and Find Full Text PDF

Purpose: In curriculum learning, the idea is to train on easier samples first and gradually increase the difficulty, while in self-paced learning, a pacing function defines the speed to adapt the training progress. While both methods heavily rely on the ability to score the difficulty of data samples, an optimal scoring function is still under exploration.

Methodology: Distillation is a knowledge transfer approach where a teacher network guides a student network by feeding a sequence of random samples.

View Article and Find Full Text PDF