Publications by authors named "M Schaekermann"

Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score in United States Medical Licensing Examination style questions. However, challenges remain in long-form medical question answering and handling real-world workflows. Here, we present Med-PaLM 2, which bridges these gaps with a combination of base LLM improvements, medical domain fine-tuning and new strategies for improving reasoning and grounding through ensemble refinement and chain of retrieval.

View Article and Find Full Text PDF

Automated radiology report generation has the potential to improve patient care and reduce the workload of radiologists. However, the path toward real-world adoption has been stymied by the challenge of evaluating the clinical quality of artificial intelligence (AI)-generated reports. We build a state-of-the-art report generation system for chest radiographs, called Flamingo-CXR, and perform an expert evaluation of AI-generated reports by engaging a panel of board-certified radiologists.

View Article and Find Full Text PDF

Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM.

View Article and Find Full Text PDF
Article Synopsis
  • Artificial intelligence in healthcare often reflects existing historical inequities, prompting the need for a new framework to evaluate fairness in AI performance for different patient populations.
  • The Health Equity Assessment of machine Learning performance (HEAL) framework was developed to quantitatively analyze whether health AI tools better serve those experiencing worse health outcomes through a detailed four-step method.
  • In a case study involving a dermatology AI model using diverse teledermatology cases, the HEAL metric was used to assess the likelihood that the AI performed better for groups with poorer health outcomes, indicating its potential for promoting equity in AI technologies.
View Article and Find Full Text PDF

We propose a novel three-stage FIND-RESOLVE-LABEL workflow for crowdsourced annotation to reduce ambiguity in task instructions and, thus, improve annotation quality. Stage 1 (FIND) asks the crowd to find examples whose correct label seems ambiguous given task instructions. Workers are also asked to provide a short tag that describes the ambiguous concept embodied by the specific instance found.

View Article and Find Full Text PDF