Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score in United States Medical Licensing Examination style questions. However, challenges remain in long-form medical question answering and handling real-world workflows. Here, we present Med-PaLM 2, which bridges these gaps with a combination of base LLM improvements, medical domain fine-tuning and new strategies for improving reasoning and grounding through ensemble refinement and chain of retrieval.
View Article and Find Full Text PDFImportance: Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas.
Objective: To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty.
Data Sources: A systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024.
Background: Tools to increase the turnaround speed and accuracy of imaging reports could positively influence ED logistics. The Caire ICH is an artificial intelligence (AI) software developed for ED physicians to recognise intracranial haemorrhages (ICHs) on non-contrast enhanced cranial CT scans to manage the clinical care of these patients in a timelier fashion.
Methods: A dataset of 532 non-contrast cranial CT scans was reviewed by five board-certified emergency physicians (EPs) with an average of 14.
Objectives: To evaluate whether one summary metric of calculator performance sufficiently conveys equity across different demographic subgroups, as well as to evaluate how calculator predictive performance affects downstream health outcomes.
Study Design: We evaluate 3 commonly used clinical calculators-Model for End-Stage Liver Disease (MELD), CHA2DS2-VASc, and simplified Pulmonary Embolism Severity Index (sPESI)-on the cohort extracted from the Stanford Medicine Research Data Repository, following the cohort selection process as described in respective calculator derivation papers.
Methods: We quantified the predictive performance of the 3 clinical calculators across sex and race.
Background: Intracranial hemorrhage (ICH) requires emergent medical treatment for positive outcomes. While previous artificial intelligence (AI) solutions achieved rapid diagnostics, none were shown to improve the performance of radiologists in detecting ICHs. Here, we show that the Caire ICH artificial intelligence system enhances a radiologist's ICH diagnosis performance.
View Article and Find Full Text PDFImportance: Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied.
View Article and Find Full Text PDFBackground: One key aspect of a learning health system (LHS) is utilizing data generated during care delivery to inform clinical care. However, institutional guidelines that utilize observational data are rare and require months to create, making current processes impractical for more urgent scenarios such as those posed by the COVID-19 pandemic. There exists a need to rapidly analyze institutional data to drive guideline creation where evidence from randomized control trials are unavailable.
View Article and Find Full Text PDF