Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion.
View Article and Find Full Text PDFObjective: To demonstrate and test the capabilities of the American College of Radiology (ACR) Connect and AI-LAB software platform by implementing multi-institutional artificial intelligence (AI) training and validation for breast density classification.
Methods: In this proof-of-concept study, six U.S.
Musculoskeletal (MSK) pain leads to significant healthcare utilization, decreased productivity, and disability globally. Due to its complex etiology, MSK pain is often chronic and challenging to manage effectively. Disparities in pain management-influenced by provider implicit biases and patient race, gender, age, and socioeconomic status-contribute to inconsistent outcomes.
View Article and Find Full Text PDFPressure injury (PI) detection is challenging, especially in dark skin tones, due to the unreliability of visual inspection. Thermography may serve as a viable alternative as temperature differences in the skin can indicate impending tissue damage. Although deep learning models hold considerable promise toward reliably detecting PI, existing work fails to evaluate performance on diverse skin tones and varying data collection protocols.
View Article and Find Full Text PDFObjective: Pulse oximetry, a ubiquitous vital sign in modern medicine, has inequitable accuracy that disproportionately affects minority Black and Hispanic patients, with associated increases in mortality, organ dysfunction, and oxygen therapy. Previous retrospective studies used self-reported race or ethnicity as a surrogate for skin tone which is believed to be the root cause of the disparity. Our objective was to determine the utility of skin tone in explaining pulse oximetry discrepancies.
View Article and Find Full Text PDFHealthcare AI faces an ethical dilemma between selective and equitable deployment, exacerbated by flawed performance metrics. These metrics inadequately capture real-world complexities and biases, leading to premature assertions of effectiveness. Improved evaluation practices, including continuous monitoring and silent evaluation periods, are crucial.
View Article and Find Full Text PDFThis narrative review focuses on the role of clinical prediction models in supporting informed decision-making in critical care, emphasizing their 2 forms: traditional scores and artificial intelligence (AI)-based models. Acknowledging the potential for both types to embed biases, the authors underscore the importance of critical appraisal to increase our trust in models. The authors outline recommendations and critical care examples to manage risk of bias in AI models.
View Article and Find Full Text PDFBackground: Although hypothesized to be the root cause of the pulse oximetry disparities, skin tone and its use for improving medical therapies have yet to be extensively studied. Studies previously used self-reported race as a proxy variable for skin tone. However, this approach cannot account for skin tone variability within race groups and also risks the potential to be confounded by other non-biological factors when modeling data.
View Article and Find Full Text PDFRationale And Objectives: Radiology residents often receive limited feedback on preliminary reports issued during independent call. This study aimed to determine if Large Language Models (LLMs) can supplement traditional feedback by identifying missed diagnoses in radiology residents' preliminary reports.
Materials & Methods: A randomly selected subset of 500 (250 train/250 validation) paired preliminary and final reports between 12/17/2022 and 5/22/2023 were extracted and de-identified from our institutional database.
Increasing evidence supports reduced accuracy of noninvasive assessment tools, such as pulse oximetry, temperature probes, and AI skin diagnosis benchmarks, in patients with darker skin tones. The FDA is exploring potential strategies for device regulation to improve performance across diverse skin tones by including skin tone criteria. However, there is no consensus about how prospective studies should perform skin tone assessment in order to take this bias into account.
View Article and Find Full Text PDFDe-identification of medical images intended for research is a core requirement for data sharing initiatives, particularly as the demand for data for artificial intelligence (AI) applications grows. The Center for Biomedical Informatics and Information Technology (CBIIT) of the United States National Cancer Institute (NCI) convened a two half-day virtual workshop with the intent of summarizing the state of the art in de-identification technology and processes and exploring interesting aspects of the subject. This paper summarizes the highlights of the second day of the workshop, the recordings and presentations of which are publicly available for review.
View Article and Find Full Text PDFAs artificial intelligence (AI) rapidly approaches human-level performance in medical imaging, it is crucial that it does not exacerbate or propagate healthcare disparities. Previous research established AI's capacity to infer demographic data from chest X-rays, leading to a key concern: do models using demographic shortcuts have unfair predictions across subpopulations? In this study, we conducted a thorough investigation into the extent to which medical AI uses demographic encodings, focusing on potential fairness discrepancies within both in-distribution training sets and external test sets. Our analysis covers three key medical imaging disciplines-radiology, dermatology and ophthalmology-and incorporates data from six global chest X-ray datasets.
View Article and Find Full Text PDFThe potential of artificial intelligence (AI) in medicine lies in its ability to enhance clinicians' capacity to analyse medical images, thereby improving diagnostic precision and accuracy and thus enhancing current tests. However, the integration of AI within health care is fraught with difficulties. Heterogeneity among health care system applications, reliance on proprietary closed-source software, and rising cybersecurity threats pose significant challenges.
View Article and Find Full Text PDFBackground: Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research.
View Article and Find Full Text PDFPulse oximeters measure peripheral arterial oxygen saturation (SpO) noninvasively, while the gold standard (SaO) involves arterial blood gas measurement. There are known racial and ethnic disparities in their performance. BOLD is a dataset that aims to underscore the importance of addressing biases in pulse oximetry accuracy, which disproportionately affect darker-skinned patients.
View Article and Find Full Text PDFBackground: The ethical governance of Artificial Intelligence (AI) in health care and public health continues to be an urgent issue for attention in policy, research, and practice. In this paper we report on central themes related to challenges and strategies for promoting ethics in research involving AI in global health, arising from the Global Forum on Bioethics in Research (GFBR), held in Cape Town, South Africa in November 2022.
Methods: The GFBR is an annual meeting organized by the World Health Organization and supported by the Wellcome Trust, the US National Institutes of Health, the UK Medical Research Council (MRC) and the South African MRC.
Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself.
View Article and Find Full Text PDF