Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content: The Journal Essay Contest.

Gisele S Silva Rohan Khera Lee H Schwamm

Stroke

Biomedical Informatics and Data Science (R.K., L.H.S.), Yale School of Medicine, New Haven, CT.

Published: October 2024

AI large language models (LLMs) can generate essays similar in quality to human authors, raising questions about authorship in scientific writing.
A 2024 essay contest featured human and AI entries on controversial medical topics, with evaluations done by a panel of experts blinded to the authors' identities.
Reviewers found it challenging to identify AI-generated essays, often scoring them higher for composition but showing bias against those perceived as less relevant or "best in topic." Scientific journals may need to educate reviewers about AI's role in writing and establish guidelines for its use.

Artificial intelligence (AI) large language models (LLMs) now produce human-like general text and images. LLMs' ability to generate persuasive scientific essays that undergo evaluation under traditional peer review has not been systematically studied. To measure perceptions of quality and the nature of authorship, we conducted a competitive essay contest in 2024 with both human and AI participants. Human authors and 4 distinct LLMs generated essays on controversial topics in stroke care and outcomes research. A panel of Editorial Board members (mostly vascular neurologists), blinded to author identity and with varying levels of AI expertise, rated the essays for quality, persuasiveness, best in topic, and author type. Among 34 submissions (22 human and 12 LLM) scored by 38 reviewers, human and AI essays received mostly similar ratings, though AI essays were rated higher for composition quality. Author type was accurately identified only 50% of the time, with prior LLM experience associated with improved accuracy. In multivariable analyses adjusted for author attributes and essay quality, only persuasiveness was independently associated with odds of a reviewer assigning AI as author type (adjusted odds ratio, 1.53 [95% CI, 1.09-2.16]; =0.01). In conclusion, a group of experienced editorial board members struggled to distinguish human versus AI authorship, with a bias against best in topic for essays judged to be AI generated. Scientific journals may benefit from educating reviewers on the types and uses of AI in scientific writing and developing thoughtful policies on the appropriate use of AI in authoring manuscripts.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529699	PMC
http://dx.doi.org/10.1161/STROKEAHA.124.045012	DOI Listing

Publication Analysis

Top Keywords

author type

human versus

artificial intelligence

essay contest

editorial board

board members

quality persuasiveness

best topic

human

essays

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!