A Comparison of Machine-Graded (ChatGPT) and Human-Graded Essay Scores in Veterinary Admissions.

J Vet Med Educ

Department of Health Management, Atlantic Veterinary College, University of Prince Edward Island, Charlottetown, PE, C1A 4P3, Canada.

Published: May 2024

Admissions committees have historically emphasized cognitive measures, but a paradigm shift toward holistic reviews now places greater importance on non-cognitive skills. These holistic reviews may include personal statements, experiences, references, interviews, multiple mini-interviews, and situational judgment tests, often requiring substantial faculty resources. Leveraging advances in artificial intelligence, particularly in natural language processing, this study was conducted to assess the agreement of essay scores graded by both humans and machines (OpenAI's ChatGPT). Correlations were calculated among these scores and cognitive and non-cognitive measures in the admissions process. Human-derived scores from 778 applicants in 2021 and 552 in 2022 had item-specific inter-rater reliabilities ranging from 0.07 to 0.41, while machine-derived inter-replicate reliabilities ranged from 0.41 to 0.61. Pairwise correlations between human- and machine-derived essay scores and other admissions criteria revealed moderate correlations between the two scoring methods (0.41) and fair correlations between the essays and the multiple mini-interview (0.20 and 0.22 for human and machine scores, respectively). Despite having very low correlations, machine-graded scores exhibited slightly stronger correlations with cognitive measures (0.10 to 0.15) compared to human-graded scores (0.01 to 0.02). Importantly, machine scores demonstrated higher precision, approximately two to three times greater than human scores in both years. This study emphasizes the importance of careful item design, rubric development, and prompt formulation when using machine-based essay grading. It also underscores the importance of employing replicates and robust statistical analyses to ensure equitable applicant ranking when integrating machine grading into the admissions process.

Download full-text PDF

Source
http://dx.doi.org/10.3138/jvme-2023-0162DOI Listing

Publication Analysis

Top Keywords

essay scores
12
scores
10
cognitive measures
8
holistic reviews
8
admissions process
8
machine scores
8
correlations
6
admissions
5
comparison machine-graded
4
machine-graded chatgpt
4

Similar Publications

Introduction: Polycystic ovarian syndrome (PCOS) is a heterogeneous endocrinal physiological disorder characterized by chronic oligo-ovulation or an-ovulation, hyperandrogenism, and polycystic morphology in ovaries on transvaginal or abdominal ultrasound. Hyperandrogenism and insulin resistance are already well-documented pathophysiological mechanisms in PCOS. Besides this, autoimmunity has been hypothesized in its pathogenesis.

View Article and Find Full Text PDF

Script Concordance Test in Physiology: Preparation, Scoring, and Student Perceptions - A Mixed Method Study.

Int J Appl Basic Med Res

November 2024

Director, Simulation Centre, Mahatma Gandhi Medical College Research Institute, Sri Balaji Vidyapeeth (Deemed to be University), Pondicherry - Cuddalore Road, Pillayarkuppam, Puducherry, India.

Background: Although the curriculum has changed, assessment tools are not in alignment with the new types of teaching such as early clinical exposure (ECE) and self-directed learning. Both in summative and formative assessment most commonly used tools for assessment of cognitive domain are written formats including MCQ. However, these assessment tools such as MCQ and written essays cannot assess the higher order thinking skills and clinical reasoning skills.

View Article and Find Full Text PDF

Aim: Whether case-based modified essay questions (MEQs) are crucial to summative assessment in medical curriculum is still debatable. The current study aimed to evaluate third-year medical students' performance in case-based MEQs and multiple-choice questions (MCQs) in summative assessment in the endocrine module.

Methods: Students' scores in mid and final module MEQs and MCQs were analyzed over four successive years from 2018/2019 to 2021/2022, where comparisons were made between students' scores in MEQs and MCQs, and between scores of students of different categories.

View Article and Find Full Text PDF

Objective: Text analysis is a form of psychological assessment that involves converting qualitative information (text) into quantitative data. We tested whether automated text analysis using Generative Pre-trained Transformers (GPTs) can match the "gold standard" of manual text analysis, even when assessing a highly nuanced construct like spirituality.

Method: In Study 1, N = 2199 US undergraduates wrote about their goals (N = 6597 texts) and completed self-reports of spirituality and theoretically related constructs (religiousness and mental health).

View Article and Find Full Text PDF

In the era of continuous development of computer technology, the application of artificial intelligence (AI) and big data is becoming more and more extensive. With the help of powerful computer and network technology, the art of visual communication (VISCOM) has ushered in a new chapter of digitalization and intelligence. How vision can better perform interdisciplinary and interdisciplinary artistic expression between art and technology and how to use more novel technology, richer forms, and more appropriate ways to express art has become a new problem in visual art creation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!