Objective: The objectives of this study were to: (1) create a corpus of synthetic drug-related patient portal messages to address the current lack of publicly available datasets for model development, (2) assess differences in language used and linguistics among the synthetic patient portal messages, and (3) assess the accuracy of patient-reported drug side effects for different racial groups.
Methods: We leveraged a taxonomy for patient- and clinician-generated content to guide prompt engineering for synthetic drug-related patient portal messages. We generated two groups of messages: the first group (200 messages) used a subset of the taxonomy relevant to a broad range of drug-related messages and the second group (250 messages) used a subset of the taxonomy relevant to a narrow range of messages focused on side effects.