A disease-specific language representation model for cerebrovascular disease research.

Ching-Heng Lin Kai-Cheng Hsu Chih-Kuang Liang Tsong-Hai Lee Chia-Wei Liou Jiann-Der Lee Tsung-I Peng Ching-Sen Shih Yang C Fann

Comput Methods Programs Biomed

Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan, Taiwan. Electronic address:

Published: November 2021

Background: Effectively utilizing disease-relevant text information from unstructured clinical notes for medical research presents many challenges. BERT (Bidirectional Encoder Representation from Transformers) related models such as BioBERT and ClinicalBERT, pre-trained on biomedical corpora and general clinical information, have shown promising performance in various biomedical language processing tasks.

Objectives: This study aims to explore whether a BERT-based model pre-trained on disease-related clinical information can be more effective for cerebrovascular disease-relevant research.

Methods: This study proposed the StrokeBERT which was initialized from BioBERT and pre-trained on large-scale cerebrovascular disease related clinical text information. The pre-trained corpora contained 113,590 discharge notes, 105,743 radiology reports, and 38,199 neurological reports. Two real-world empirical clinical tasks were conducted to validate StrokeBERT's performance. The first task identified extracranial and intracranial artery stenosis from two independent sets of radiology angiography reports. The second task predicted the risk of recurrent ischemic stroke based on patients' first discharge information.

Results: In stenosis detection, StrokeBERT showed improved performance on targeted carotid arteries, with an average AUC compared to that of ClinicalBERT of 0.968 ± 0.021 and 0.956 ± 0.018, respectively. In recurrent ischemic stroke prediction, after 10-fold cross-validation on 1,700 discharge information, StrokeBERT presented better prediction ability (AUC±SD = 0.838 ± 0.017) than ClinicalBERT (AUC±SD = 0.808 ± 0.045). The attention scores of StrokeBERT showed better ability to detect and associate cerebrovascular disease related terms than current BERT based models.

Conclusions: This study shows that a disease-specific BERT model improved the performance and accuracy of various disease-specific language processing tasks and can readily be fine-tuned to advance cerebrovascular disease research and further developed for clinical applications.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551061	PMC
http://dx.doi.org/10.1016/j.cmpb.2021.106446	DOI Listing

Publication Analysis

Top Keywords

cerebrovascular disease

disease-specific language

language processing

recurrent ischemic

ischemic stroke

improved performance

clinical

cerebrovascular

language representation

representation model

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered