Publications by authors named "Judy Gichoya"

Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion.

View Article and Find Full Text PDF
Article Synopsis
  • There is a significant risk of reinforcing existing health inequalities in AI health technologies due to biases, primarily stemming from the datasets used.
  • The STANDING Together recommendations focus on transparency in health datasets and proactive evaluation of their impacts on different population groups, informed by a comprehensive research process with over 350 global contributors.
  • The 29 recommendations are divided into guidance for documenting health datasets and strategies for using them, aiming to identify and reduce algorithmic biases while promoting awareness of the inherent limitations in all datasets.
View Article and Find Full Text PDF
Article Synopsis
  • This review analyzes various mammography datasets used for AI development in breast cancer screening, focusing on their transparency, content, and accessibility.
  • A search identified 254 datasets, with only 28 being accessible; most datasets came from Europe, East Asia, and North America, raising concerns over poor demographic representation.
  • The findings highlight significant gaps in diversity within these datasets, underscoring the need for better documentation and inclusivity to enhance the effectiveness of AI technologies in breast cancer research.
View Article and Find Full Text PDF

Objective: To demonstrate and test the capabilities of the American College of Radiology (ACR) Connect and AI-LAB software platform by implementing multi-institutional artificial intelligence (AI) training and validation for breast density classification.

Methods: In this proof-of-concept study, six U.S.

View Article and Find Full Text PDF

Musculoskeletal (MSK) pain leads to significant healthcare utilization, decreased productivity, and disability globally. Due to its complex etiology, MSK pain is often chronic and challenging to manage effectively. Disparities in pain management-influenced by provider implicit biases and patient race, gender, age, and socioeconomic status-contribute to inconsistent outcomes.

View Article and Find Full Text PDF

Pressure injury (PI) detection is challenging, especially in dark skin tones, due to the unreliability of visual inspection. Thermography may serve as a viable alternative as temperature differences in the skin can indicate impending tissue damage. Although deep learning models hold considerable promise toward reliably detecting PI, existing work fails to evaluate performance on diverse skin tones and varying data collection protocols.

View Article and Find Full Text PDF
Article Synopsis
  • Some people are excited because big language models (LLMs) can pass important medical tests.
  • This makes doctors think about what skills they need to work well with these AI tools.
  • To prepare future doctors for using AI, medical schools might need to change how they teach!
View Article and Find Full Text PDF

Objective: Pulse oximetry, a ubiquitous vital sign in modern medicine, has inequitable accuracy that disproportionately affects minority Black and Hispanic patients, with associated increases in mortality, organ dysfunction, and oxygen therapy. Previous retrospective studies used self-reported race or ethnicity as a surrogate for skin tone which is believed to be the root cause of the disparity. Our objective was to determine the utility of skin tone in explaining pulse oximetry discrepancies.

View Article and Find Full Text PDF

Healthcare AI faces an ethical dilemma between selective and equitable deployment, exacerbated by flawed performance metrics. These metrics inadequately capture real-world complexities and biases, leading to premature assertions of effectiveness. Improved evaluation practices, including continuous monitoring and silent evaluation periods, are crucial.

View Article and Find Full Text PDF

This narrative review focuses on the role of clinical prediction models in supporting informed decision-making in critical care, emphasizing their 2 forms: traditional scores and artificial intelligence (AI)-based models. Acknowledging the potential for both types to embed biases, the authors underscore the importance of critical appraisal to increase our trust in models. The authors outline recommendations and critical care examples to manage risk of bias in AI models.

View Article and Find Full Text PDF
Article Synopsis
  • TRIPOD-LLM is a new set of reporting guidelines specifically designed for the use of Large Language Models (LLMs) in biomedical research, aiming to standardize transparency and quality in healthcare applications.
  • The guidelines include a checklist with 19 main items and 50 subitems, adaptable to various research designs, emphasizing the importance of human oversight and task-specific performance.
  • An interactive website is provided to help researchers easily complete the guidelines and generate submissions, with the intention of continually updating the document as the field evolves.
View Article and Find Full Text PDF

Background: Although hypothesized to be the root cause of the pulse oximetry disparities, skin tone and its use for improving medical therapies have yet to be extensively studied. Studies previously used self-reported race as a proxy variable for skin tone. However, this approach cannot account for skin tone variability within race groups and also risks the potential to be confounded by other non-biological factors when modeling data.

View Article and Find Full Text PDF
Article Synopsis
  • * Current tools to measure health equity are limited, often focusing on specific areas of patient care rather than the entire healthcare process.
  • * A study introduced a process mining framework to track patient care actions, revealing that while treatment was similar for men and women, non-English speaking patients experienced delays despite having similar illness severity.
View Article and Find Full Text PDF

Rationale And Objectives: Radiology residents often receive limited feedback on preliminary reports issued during independent call. This study aimed to determine if Large Language Models (LLMs) can supplement traditional feedback by identifying missed diagnoses in radiology residents' preliminary reports.

Materials & Methods: A randomly selected subset of 500 (250 train/250 validation) paired preliminary and final reports between 12/17/2022 and 5/22/2023 were extracted and de-identified from our institutional database.

View Article and Find Full Text PDF

Increasing evidence supports reduced accuracy of noninvasive assessment tools, such as pulse oximetry, temperature probes, and AI skin diagnosis benchmarks, in patients with darker skin tones. The FDA is exploring potential strategies for device regulation to improve performance across diverse skin tones by including skin tone criteria. However, there is no consensus about how prospective studies should perform skin tone assessment in order to take this bias into account.

View Article and Find Full Text PDF

De-identification of medical images intended for research is a core requirement for data sharing initiatives, particularly as the demand for data for artificial intelligence (AI) applications grows. The Center for Biomedical Informatics and Information Technology (CBIIT) of the United States National Cancer Institute (NCI) convened a two half-day virtual workshop with the intent of summarizing the state of the art in de-identification technology and processes and exploring interesting aspects of the subject. This paper summarizes the highlights of the second day of the workshop, the recordings and presentations of which are publicly available for review.

View Article and Find Full Text PDF
Article Synopsis
  • - The study aimed to create and validate machine learning models to predict failure of high-flow nasal cannula (HFNC) therapy in COVID-19 patients, while comparing these models to the traditional ROX index and examining accuracy across different races.
  • - Conducted as a retrospective cohort study at four Emory University hospitals, it analyzed data from 984 adult COVID-19 patients who received HFNC therapy, identifying that 32.2% experienced HFNC failure.
  • - The eXtreme Gradient Boosting (XGB) model showed superior performance (AUROC of 0.707) compared to the ROX index (AUROC of 0.616), but also highlighted significant racial disparities in prediction accuracy, which were less pronounced
View Article and Find Full Text PDF

As artificial intelligence (AI) rapidly approaches human-level performance in medical imaging, it is crucial that it does not exacerbate or propagate healthcare disparities. Previous research established AI's capacity to infer demographic data from chest X-rays, leading to a key concern: do models using demographic shortcuts have unfair predictions across subpopulations? In this study, we conducted a thorough investigation into the extent to which medical AI uses demographic encodings, focusing on potential fairness discrepancies within both in-distribution training sets and external test sets. Our analysis covers three key medical imaging disciplines-radiology, dermatology and ophthalmology-and incorporates data from six global chest X-ray datasets.

View Article and Find Full Text PDF

The potential of artificial intelligence (AI) in medicine lies in its ability to enhance clinicians' capacity to analyse medical images, thereby improving diagnostic precision and accuracy and thus enhancing current tests. However, the integration of AI within health care is fraught with difficulties. Heterogeneity among health care system applications, reliance on proprietary closed-source software, and rising cybersecurity threats pose significant challenges.

View Article and Find Full Text PDF

Background: Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research.

View Article and Find Full Text PDF

Pulse oximeters measure peripheral arterial oxygen saturation (SpO) noninvasively, while the gold standard (SaO) involves arterial blood gas measurement. There are known racial and ethnic disparities in their performance. BOLD is a dataset that aims to underscore the importance of addressing biases in pulse oximetry accuracy, which disproportionately affect darker-skinned patients.

View Article and Find Full Text PDF

Background: The ethical governance of Artificial Intelligence (AI) in health care and public health continues to be an urgent issue for attention in policy, research, and practice. In this paper we report on central themes related to challenges and strategies for promoting ethics in research involving AI in global health, arising from the Global Forum on Bioethics in Research (GFBR), held in Cape Town, South Africa in November 2022.

Methods: The GFBR is an annual meeting organized by the World Health Organization and supported by the Wellcome Trust, the US National Institutes of Health, the UK Medical Research Council (MRC) and the South African MRC.

View Article and Find Full Text PDF

Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself.

View Article and Find Full Text PDF