Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21-6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137882PMC
http://dx.doi.org/10.1038/s41746-021-00455-yDOI Listing

Publication Analysis

Top Keywords

disease prediction
12
contextualized embeddings
8
electronic health
8
health records
8
deep learning
8
large training
8
structured ehr
8
med-bert
7
models
6
training
6

Similar Publications

Usefulness of Myelin Quantification Using Synthetic Magnetic Resonance Imaging for Predicting Outcomes in Patients With Acute Ischemic Stroke.

Stroke

January 2025

Department of Clinical Neuroscience and Therapeutics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Japan (M.T., T.N., S.A., H.M.).

Background: Synthetic magnetic resonance imaging (MRI) is an innovative MRI technology that enables the acquisition of multiple quantitative values, including T1 and T2 values, proton density, and myelin volume, in a single scan. Although the usefulness of myelin measurement with synthetic MRI has been reported for assessing several diseases, investigations in patients with stroke have not been reported. We aimed to explore the utility of myelin quantification using synthetic MRI in predicting outcomes in patients with acute ischemic stroke.

View Article and Find Full Text PDF

Objective: Among the different subtypes of invasive lung adenocarcinoma, lepidic predominant adenocarcinoma (LPA) has been recognized as the lowest-risk subtype with good prognosis. The aim of this study is to provide insight into the heterogeneity within LPA tumors and to better understand the influence of other sub-histologies on survival outcome.

Methods: Overall, 75 consecutive patients with LPA in pathologic stage I (TNM 8th edition) who underwent resection between 2010 and 2022 were included into this retrospective, single center analysis.

View Article and Find Full Text PDF

Molecular diagnosis limitations, including complex treatment processes, low cost-effectiveness, and operator-dependent low reproducibility, interrupt the timely prevention of disease spread and the development of medical devices for home and outdoor uses. A newly fabricated gold nanopillar array-based film is presented for superior photothermal energy conversion. Magnifying the metal film surface-to-volume ratio increases the photothermal energy conversion efficiency, resulting in a swift reduction in the gene amplification reaction time.

View Article and Find Full Text PDF

Background: Diabetic kidney disease (DKD) is one of the typical complications of type 2 diabetes (T2D), with approximately 10 % of DKD patients experiencing a Rapid decline (RD) in kidney function. RD leads to an increased risk of poor outcomes such as the need for dialysis. Albuminuria is a known kidney damage biomarker for DKD, yet RD cases do not always show changes in albuminuria, and the exact mechanism of RD remains unclear.

View Article and Find Full Text PDF

A discrete choice experiment on Chinese parents' preferences of vaccine schedules against six childhood infectious diseases.

Vaccine X

January 2025

National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, National Immunization Program, Chinese Center for Disease Control and Prevention, Beijing, China.

Background: China's Expanded Program on Immunization (EPI) provides vaccinations against 12 vaccine preventable diseases (VPDs) at no cost to families. For some VPDs, parents may opt to substitute equivalent non-program vaccines, including combination vaccines, for EPI vaccines; substitute vaccines must be paid for by the family. Although parents have several choices for vaccinating their children, their preferences for vaccines and immunization schedules have not been systematically evaluated.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!