We analyzed a natural language processing (NLP) toolkit's ability to classify unstructured EHR data by psychiatric diagnosis. Expertise can be a barrier to using NLP. We employed an NLP toolkit (CLARK) created to support studies led by investigators with a range of informatics knowledge. The EHR of 652 patients were manually reviewed to establish Depression and Substance Use Disorder (SUD) labeled datasets, which were split into training and evaluation datasets. We used CLARK to train depression and SUD classification models using training datasets; model performance was analyzed against evaluation datasets. The depression model accurately classified 69% of records (sensitivity = 0.68, specificity = 0.70, F1 = 0.68). The SUD model accurately classified 84% of records (sensitivity = 0.56, specificity = 0.92, F1 = 0.57). The depression model performed a more balanced job, while the SUD model's high specificity was paired with a low sensitivity. NLP applications may be especially helpful when combined with a confidence threshold for manual review.

Download full-text PDF

Source
http://dx.doi.org/10.1177/14604582241296411DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657637PMC

Publication Analysis

Top Keywords

natural language
8
language processing
8
psychiatric diagnosis
8
evaluation datasets
8
depression model
8
model accurately
8
accurately classified
8
records sensitivity
8
processing toolkit
4
toolkit classify
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!