Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM.
View Article and Find Full Text PDFBackground/aims: Deep learning systems (DLSs) for diabetic retinopathy (DR) detection show promising results but can underperform in racial and ethnic minority groups, therefore external validation within these populations is critical for health equity. This study evaluates the performance of a DLS for DR detection among Indigenous Australians, an understudied ethnic group who suffer disproportionately from DR-related blindness.
Methods: We performed a retrospective external validation study comparing the performance of a DLS against a retinal specialist for the detection of more-than-mild DR (mtmDR), vision-threatening DR (vtDR) and all-cause referable DR.
Background: Diabetic retinopathy is a leading cause of preventable blindness, especially in low-income and middle-income countries (LMICs). Deep-learning systems have the potential to enhance diabetic retinopathy screenings in these settings, yet prospective studies assessing their usability and performance are scarce.
Methods: We did a prospective interventional cohort study to evaluate the real-world performance and feasibility of deploying a deep-learning system into the health-care system of Thailand.
Importance: Most dermatologic cases are initially evaluated by nondermatologists such as primary care physicians (PCPs) or nurse practitioners (NPs).
Objective: To evaluate an artificial intelligence (AI)-based tool that assists with diagnoses of dermatologic conditions.
Design, Setting, And Participants: This multiple-reader, multiple-case diagnostic study developed an AI-based tool and evaluated its utility.
Objective: To evaluate diabetic retinopathy (DR) screening via deep learning (DL) and trained human graders (HG) in a longitudinal cohort, as case spectrum shifts based on treatment referral and new-onset DR.
Methods: We randomly selected patients with diabetes screened twice, two years apart within a nationwide screening program. The reference standard was established via adjudication by retina specialists.
Importance: Expert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored.
Objective: To evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies.
Transl Vis Sci Technol
November 2019
Purpose: To present and evaluate a remote, tool-based system and structured grading rubric for adjudicating image-based diabetic retinopathy (DR) grades.
Methods: We compared three different procedures for adjudicating DR severity assessments among retina specialist panels, including (1) in-person adjudication based on a previously described procedure (Baseline), (2) remote, tool-based adjudication for assessing DR severity alone (TA), and (3) remote, tool-based adjudication using a feature-based rubric (TA-F). We developed a system allowing graders to review images remotely and asynchronously.
Purpose: To develop and validate a deep learning (DL) algorithm that predicts referable glaucomatous optic neuropathy (GON) and optic nerve head (ONH) features from color fundus images, to determine the relative importance of these features in referral decisions by glaucoma specialists (GSs) and the algorithm, and to compare the performance of the algorithm with eye care providers.
Design: Development and validation of an algorithm.
Participants: Fundus images from screening programs, studies, and a glaucoma clinic.
Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. A total of 25,326 gradable retinal images of patients with diabetes from the community-based, nationwide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME).
View Article and Find Full Text PDFPurpose: To understand the impact of deep learning diabetic retinopathy (DR) algorithms on physician readers in computer-assisted settings.
Design: Evaluation of diagnostic technology.
Participants: One thousand seven hundred ninety-six retinal fundus images from 1612 diabetic patients.
While the fourth human visual field map (hV4) has been studied for two decades, there remain uncertainties about its spatial organization. In analyzing fMRI measurements designed to resolve these issues, we discovered a significant problem that afflicts measurements from ventral occipital cortex, and particularly measurements near hV4. In most hemispheres the fMRI hV4 data are contaminated by artifacts from the transverse sinus (TS).
View Article and Find Full Text PDFRepeating object images produces stimulus-specific repetition suppression referred to as functional magnetic resonance imaging-adaptation (fMRI-A) in ventral temporal cortex (VTC). However, the effects of stimulus repetition on functional selectivity are largely unknown. We investigated the effects of short-lagged (SL, immediate) and long-lagged (LL, many intervening stimuli) repetitions on category selectivity in VTC using high-resolution fMRI.
View Article and Find Full Text PDFWhat is the relationship between retinotopy and object selectivity in human lateral occipital (LO) cortex? We used functional magnetic resonance imaging (fMRI) to examine sensitivity to retinal position and category in LO, an object-selective region positioned posterior to MT along the lateral cortical surface. Six subjects participated in phase-encoded retinotopic mapping experiments as well as block-design experiments in which objects from six different categories were presented at six distinct positions in the visual field. We found substantial position modulation in LO using standard nonobject retinotopic mapping stimuli; this modulation extended beyond the boundaries of visual field maps LO-1 and LO-2.
View Article and Find Full Text PDFA region in ventral human cortex (fusiform face area, FFA) thought to be important for face perception responds strongly to faces and less strongly to nonface objects. This pattern of response may reflect a uniform face-selective neural population or activity averaged across populations with heterogeneous selectivity. Using high-resolution functional magnetic resonance imaging (MRI), we found that the FFA has a reliable heterogeneous structure: localized subregions within the FFA highly selective to faces are spatially interdigitated with localized subregions highly selective to different object categories.
View Article and Find Full Text PDFJ Neurophysiol
February 2006
Object-selective cortical regions exhibit a decreased response when an object stimulus is repeated [repetition suppression (RS)]. RS is often associated with priming: reduced response times and increased accuracy for repeated stimuli. It is unknown whether RS reflects stimulus-specific repetition, the associated changes in response time, or the combination of the two.
View Article and Find Full Text PDF