Red teaming ChatGPT in medicine to yield real-world insights on model behavior.

NPJ Digit Med

Department of Dermatology, Stanford University, Stanford, USA.

Published: March 2025

Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889229	PMC
http://dx.doi.org/10.1038/s41746-025-01542-0	DOI Listing

Publication Analysis

Top Keywords

red teaming

red

teaming chatgpt

chatgpt medicine

medicine yield

yield real-world

real-world insights

model

insights model

model behavior

Similar Publications

Impact of COVID-19 on key populations and people living with HIV: recommendations and sociopolitical responses from the EPIC community research program in Latin America.

BMC Public Health

March 2025

Coalition PLUS, Pantin, France.

Valeria Stuardo Ávila Océane Apffel Font Ángela León Cáceres Pablo Radusky Ines Aristegui

Background: Health inequality in Latin America is particularly severe for individuals living with HIV (PLHIV) and key populations, such as men who have sex with men, transgender women, people who use drugs, and sex workers. Despite regional programs aimed at reducing health inequalities, such as the Sustainable Development Goals and the Sustainable Health Agenda for the Americas 2018-2030, the COVID-19 health crisis has exposed significant shortcomings in national healthcare systems for PLHIV and key populations. The multi-country, community-based research program, EPIC, was developed by Coalition PLUS within an network of community-based organizations engaged in the response to HIV and viral hepatitis.

View Article and Find Full Text PDF

Similar Publications

Comparing apples and pears? Evaluating the interchangeability of three different positions for hip abduction and adduction strength testing in academy footballers.

J Athl Train

March 2025

Red Bull Athlete Performance Center, Salzburg Austria.

James O'Brien Markus Huthöfer Emanuel Santner Tatjana Becker Thomas Stöggl

Objectives: To compare strength parameters and pain ratings across three different positions forisometric hip abduction and adduction strength testing. Design: Cross-sectional study. Setting: Two elite European football academies.

View Article and Find Full Text PDF

Similar Publications

Red teaming ChatGPT in medicine to yield real-world insights on model behavior.

NPJ Digit Med

March 2025

Department of Dermatology, Stanford University, Stanford, USA.

Crystal T Chang Hodan Farah Haiwen Gui Shawheen Justin Rezaei Charbel Bou-Khalil

View Article and Find Full Text PDF

Similar Publications

Phosphatidylethanol clearance after packed red blood cell transfusion.

Clin Biochem

March 2025

Department of Laboratory Medicine and Pathology, Mayo Clinic Arizona, United States. Electronic address:

Olivia C Iverson Karlie A Smith Pragya Sharma Matthew R Buras Jaxon K Quillen

Objectives: Phosphatidylethanol (PEth) is a long-term marker of alcohol consumption used clinically for evaluating abstinence in patients including transplant candidates. Packed red blood cell (pRBC) transfusion can introduce exogenous PEth to recipients, complicating interpretation. This study evaluated the kinetics and duration of PEth 16:0/18:1 positivity post-transfusion.

View Article and Find Full Text PDF

Similar Publications

Current management and future perspectives of covert hepatic encephalopathy in Japan: a nationwide survey.

J Gastroenterol

March 2025

Department of Gastroenterology/Internal Medicine, Graduate School of Medicine, Gifu University, 1-1 Yanagido, Gifu, 501-1194, Japan.

Takao Miwa Mio Tsuruoka Hajime Ueda Tamami Abe Hiroki Inada

Background: Covert hepatic encephalopathy (CHE) leads to devastating outcomes in patients with cirrhosis. This study aims to elucidate the current management and future perspectives of CHE in Japan.

Methods: A questionnaire-based cross-sectional study was conducted among physicians involved in managing cirrhosis in Japan.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!