Identification of an ANCA-associated vasculitis cohort using deep learning and electronic health records.

Liqin Wang John Novoa-Laurentiev Claire Cook Shruthi Srivatsan Yining Hua Jie Yang Eli Miloslavsky Hyon K Choi Li Zhou Zachary S Wallace

Int J Med Inform

Rheumatology and Allergy Clinical Epidemiology Research Center and Division of Rheumatology, Allergy, and Immunology, and Mongan Institute, Department of Medicine, Massachusetts General Hospital Boston MA USA. Electronic address:

Published: January 2025

Background: ANCA-associated vasculitis (AAV) is a rare but serious disease. Traditional case-identification methods using claims data can be time-intensive and may miss important subgroups. We hypothesized that a deep learning model analyzing electronic health records (EHR) can more accurately identify AAV cases.

Methods: We examined the Mass General Brigham (MGB) repository of clinical documentation from 12/1/1979 to 5/11/2021, using expert-curated keywords and ICD codes to identify a large cohort of potential AAV cases. Three labeled datasets (I, II, III) were created, each containing note sections. We trained and evaluated a range of machine learning and deep learning algorithms for note-level classification, using metrics like positive predictive value (PPV), sensitivity, F-score, area under the receiver operating characteristic curve (AUROC), and area under the precision and recall curve (AUPRC). The hierarchical attention network (HAN) was further evaluated for its ability to classify AAV cases at the patient-level, compared with rule-based algorithms in 2000 randomly chosen samples.

Results: Datasets I, II, and III comprised 6000, 3008, and 7500 note sections, respectively. HAN achieved the highest AUROC in all three datasets, with scores of 0.983, 0.991, and 0.991. The deep learning approach also had among the highest PPVs across the three datasets (0.941, 0.954, and 0.800, respectively). In a test cohort of 2000 cases, the HAN model achieved a PPV of 0.262 and an estimated sensitivity of 0.975. Compared to the best rule-based algorithm, HAN identified six additional AAV cases, representing 13% of the total.

Conclusion: The deep learning model effectively classifies clinical note sections for AAV diagnosis. Its application to EHR notes can potentially uncover additional cases missed by traditional rule-based methods.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.ijmedinf.2025.105797	DOI Listing

Publication Analysis

Top Keywords

deep learning

aav cases

note sections

anca-associated vasculitis

electronic health

health records

learning model

datasets iii

three datasets

learning

Similar Publications

Abundant repressor binding sites in human enhancers are associated with the fine-tuning of gene regulation.

iScience

January 2025

Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Wei Song Ivan Ovcharenko

The regulation of gene expression relies on the coordinated action of transcription factors (TFs) at enhancers, including both activator and repressor TFs. We employed deep learning (DL) to dissect HepG2 enhancers into positive (PAR), negative (NAR), and neutral activity regions. Sharpr-MPRA and STARR-seq highlight the dichotomy impact of NARs and PARs on modulating and catalyzing the activity of enhancers, respectively.

View Article and Find Full Text PDF

Similar Publications

Deep learning uncovers histological patterns of YAP1/TEAD activity related to disease aggressiveness in cancer patients.

iScience

January 2025

Sanofi, Paris, France.

Benoit Schmauch Vincent Cabeli Omar Darwiche Domingues Jean-Eudes Le Douget Alexandra Hardy

Over the last decade, Hippo signaling has emerged as a major tumor-suppressing pathway. Its dysregulation is associated with abnormal expression of and -family genes. Recent works have highlighted the role of YAP1/TEAD activity in several cancers and its potential therapeutic implications.

View Article and Find Full Text PDF

Similar Publications

Research on grading detection methods for diabetic retinopathy based on deep learning.

Pak J Med Sci

January 2025

Juan Chen, Department of Ophthalmology, Affiliated Hospital of Hangzhou Normal University, Hangzhou, Zhejiang, China.

Jing Zhang Juan Chen

Objective: To design a deep learning-based model for early screening of diabetic retinopathy, predict the condition, and provide interpretable justifications.

Methods: The experiment's model structure is designed based on the Vision Transformer architecture which was initiated in March 2023 and the first version was produced in July 2023 at Affiliated Hospital of Hangzhou Normal University. We use the publicly available EyePACS dataset as input to train the model.

View Article and Find Full Text PDF

Similar Publications

Liver fibrosis classification on trichrome histology slides using weakly supervised learning in children and young adults.

J Pathol Inform

January 2025

Cincinnati Children's AI Imaging Research (CAIIR) Center, Cincinnati, OH, United States.

Mahdieh Shabanian Zachary Taylor Christopher Woods Anas Bernieh Jonathan Dillman

Background: Traditional liver fibrosis staging via percutaneous biopsy suffers from sampling bias and variable inter-pathologist agreement, highlighting the need for more objective techniques. Deep learning models for disease staging from medical images have shown potential to decrease diagnostic variability, with recent weakly supervised learning strategies showing promising results even with limited manual annotation.

Purpose: To study the clustering-constrained attention multiple instance learning (CLAM) approach for staging liver fibrosis on trichrome whole slide images (WSIs) of children and young adults.

View Article and Find Full Text PDF

Similar Publications

Who is WithMe? EEG features for attention in a visual task, with auditory and rhythmic support.

Front Neurosci

January 2025

Department of Mathematics, University of Antwerp-Interuniversity Microelectronics Centre (imec), Antwerp, Belgium.

Renata Turkeš Steven Mortier Jorg De Winne Dick Botteldooren Paul Devos

Introduction: The study of attention has been pivotal in advancing our comprehension of cognition. The goal of this study is to investigate which EEG data representations or features are most closely linked to attention, and to what extent they can handle the cross-subject variability.

Methods: We explore the features obtained from the univariate time series from a single EEG channel, such as time domain features and recurrence plots, as well as representations obtained directly from the multivariate time series, such as global field power or functional brain networks.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!