Taxonomy-based prompt engineering to generate synthetic drug-related patient portal messages.

J Biomed Inform

Institute for Computational Medicine, Johns Hopkins University, Baltimore, 21218, MD, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, MD, USA; Division of General Internal Medicine, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, 21205, MD, USA.

Published: December 2024

Objective: The objectives of this study were to: (1) create a corpus of synthetic drug-related patient portal messages to address the current lack of publicly available datasets for model development, (2) assess differences in language used and linguistics among the synthetic patient portal messages, and (3) assess the accuracy of patient-reported drug side effects for different racial groups.

Methods: We leveraged a taxonomy for patient- and clinician-generated content to guide prompt engineering for synthetic drug-related patient portal messages. We generated two groups of messages: the first group (200 messages) used a subset of the taxonomy relevant to a broad range of drug-related messages and the second group (250 messages) used a subset of the taxonomy relevant to a narrow range of messages focused on side effects. Prompts also include one of five racial groups. Next, we assessed linguistic characteristics among message parts (subject, beginning, body, ending) across different prompt specifications (urgency, patient portal taxa, race). We also assessed the performance and frequency of patient-reported side effects across different racial groups and compared to data present in a real world data source (SIDER).

Results: The study generated 450 synthetic patient portal messages, and we assessed linguistic patterns, accuracy of drug-side effect pairs, frequency of pairs compared to real world data. Linguistic analysis revealed variations in language usage and politeness and analysis of positive predictive values identified differences in symptoms reported based on urgency levels and racial groups in the prompt. We also found that low incident SIDER drug-side effect pairs were observed less frequently in our dataset.

Conclusion: This study demonstrates the potential of synthetic patient portal messages as a valuable resource for healthcare research. After creating a corpus of synthetic drug-related patient portal messages, we identified significant language differences and provided evidence that drug-side effect pairs observed in messages are comparable to what is expected in real world settings.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2024.104752DOI Listing

Publication Analysis

Top Keywords

patient portal
32
portal messages
28
synthetic drug-related
16
drug-related patient
16
messages
13
synthetic patient
12
side effects
12
racial groups
12
drug-side pairs
12
prompt engineering
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!