Machine Scoring of Medical Students' Written Clinical Reasoning: Initial Validity Evidence.

Anna T Cianciolo Noelle LaVoie James Parker

Acad Med

J. Parker is senior research associate, Parallel Consulting, Petaluma, California.

Published: July 2021

Purpose: Developing medical students' clinical reasoning requires a structured longitudinal curriculum with frequent targeted assessment and feedback. Performance-based assessments, which have the strongest validity evidence, are currently not feasible for this purpose because they are time-intensive to score. This study explored the potential of using machine learning technologies to score one such assessment-the diagnostic justification essay.

Method: From May to September 2018, machine scoring algorithms were trained to score a sample of 700 diagnostic justification essays written by 414 third-year medical students from the Southern Illinois University School of Medicine classes of 2012-2017. The algorithms applied semantically based natural language processing metrics (e.g., coherence, readability) to assess essay quality on 4 criteria (differential diagnosis, recognition and use of findings, workup, and thought process); the scores for these criteria were summed to create overall scores. Three sources of validity evidence (response process, internal structure, and association with other variables) were examined.

Results: Machine scores correlated more strongly with faculty ratings than faculty ratings did with each other (machine: .28-.53, faculty: .13-.33) and were less case-specific. Machine scores and faculty ratings were similarly correlated with medical knowledge, clinical cognition, and prior diagnostic justification. Machine scores were more strongly associated with clinical communication than were faculty ratings (.43 vs .31).

Conclusions: Machine learning technologies may be useful for assessing medical students' long-form written clinical reasoning. Semantically based machine scoring may capture the communicative aspects of clinical reasoning better than faculty ratings, offering the potential for automated assessment that generalizes to the workplace. These results underscore the potential of machine scoring to capture an aspect of clinical reasoning performance that is difficult to assess with traditional analytic scoring methods. Additional research should investigate machine scoring generalizability and examine its acceptability to trainees and educators.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8243833	PMC
http://dx.doi.org/10.1097/ACM.0000000000004010	DOI Listing

Publication Analysis

Top Keywords

machine scoring

clinical reasoning

faculty ratings

medical students'

validity evidence

diagnostic justification

machine scores

machine

written clinical

potential machine

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!