Background: ANCA-associated vasculitis (AAV) is a rare but serious disease. Traditional case-identification methods using claims data can be time-intensive and may miss important subgroups. We hypothesized that a deep learning model analyzing electronic health records (EHR) can more accurately identify AAV cases.

Methods: We examined the Mass General Brigham (MGB) repository of clinical documentation from 12/1/1979 to 5/11/2021, using expert-curated keywords and ICD codes to identify a large cohort of potential AAV cases. Three labeled datasets (I, II, III) were created, each containing note sections. We trained and evaluated a range of machine learning and deep learning algorithms for note-level classification, using metrics like positive predictive value (PPV), sensitivity, F-score, area under the receiver operating characteristic curve (AUROC), and area under the precision and recall curve (AUPRC). The hierarchical attention network (HAN) was further evaluated for its ability to classify AAV cases at the patient-level, compared with rule-based algorithms in 2000 randomly chosen samples.

Results: Datasets I, II, and III comprised 6000, 3008, and 7500 note sections, respectively. HAN achieved the highest AUROC in all three datasets, with scores of 0.983, 0.991, and 0.991. The deep learning approach also had among the highest PPVs across the three datasets (0.941, 0.954, and 0.800, respectively). In a test cohort of 2000 cases, the HAN model achieved a PPV of 0.262 and an estimated sensitivity of 0.975. Compared to the best rule-based algorithm, HAN identified six additional AAV cases, representing 13% of the total.

Conclusion: The deep learning model effectively classifies clinical note sections for AAV diagnosis. Its application to EHR notes can potentially uncover additional cases missed by traditional rule-based methods.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ijmedinf.2025.105797DOI Listing

Publication Analysis

Top Keywords

deep learning
20
aav cases
12
note sections
12
anca-associated vasculitis
8
electronic health
8
health records
8
learning model
8
datasets iii
8
three datasets
8
learning
6

Similar Publications

Abundant repressor binding sites in human enhancers are associated with the fine-tuning of gene regulation.

iScience

January 2025

Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

The regulation of gene expression relies on the coordinated action of transcription factors (TFs) at enhancers, including both activator and repressor TFs. We employed deep learning (DL) to dissect HepG2 enhancers into positive (PAR), negative (NAR), and neutral activity regions. Sharpr-MPRA and STARR-seq highlight the dichotomy impact of NARs and PARs on modulating and catalyzing the activity of enhancers, respectively.

View Article and Find Full Text PDF

Over the last decade, Hippo signaling has emerged as a major tumor-suppressing pathway. Its dysregulation is associated with abnormal expression of and -family genes. Recent works have highlighted the role of YAP1/TEAD activity in several cancers and its potential therapeutic implications.

View Article and Find Full Text PDF

Objective: To design a deep learning-based model for early screening of diabetic retinopathy, predict the condition, and provide interpretable justifications.

Methods: The experiment's model structure is designed based on the Vision Transformer architecture which was initiated in March 2023 and the first version was produced in July 2023 at Affiliated Hospital of Hangzhou Normal University. We use the publicly available EyePACS dataset as input to train the model.

View Article and Find Full Text PDF

Background: Traditional liver fibrosis staging via percutaneous biopsy suffers from sampling bias and variable inter-pathologist agreement, highlighting the need for more objective techniques. Deep learning models for disease staging from medical images have shown potential to decrease diagnostic variability, with recent weakly supervised learning strategies showing promising results even with limited manual annotation.

Purpose: To study the clustering-constrained attention multiple instance learning (CLAM) approach for staging liver fibrosis on trichrome whole slide images (WSIs) of children and young adults.

View Article and Find Full Text PDF

Introduction: The study of attention has been pivotal in advancing our comprehension of cognition. The goal of this study is to investigate which EEG data representations or features are most closely linked to attention, and to what extent they can handle the cross-subject variability.

Methods: We explore the features obtained from the univariate time series from a single EEG channel, such as time domain features and recurrence plots, as well as representations obtained directly from the multivariate time series, such as global field power or functional brain networks.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!