Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date.

Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey.

Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center's electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis.

Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI.

Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10910118PMC
http://dx.doi.org/10.1016/j.jacig.2024.100224DOI Listing

Publication Analysis

Top Keywords

machine learning
12
natural language
8
language processing
8
clinical notes
8
health record
8
patient detection
8
nlp machine
8
iei patient
8
processing clinical
4
notes enables
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!