A machine learning-based framework to identify type 2 diabetes through electronic health records.

Int J Med Inform

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Electronic address:

Published: January 2017

Objective: To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate.

Materials And Methods: We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014.

Results: We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC).

Discussion: Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature engineering to loosen such selection criteria to achieve a high identification rate of cases and controls.

Conclusions: Our proposed framework demonstrates a more accurate and efficient approach for identifying subjects with and without T2DM from EHR.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5144921PMC
http://dx.doi.org/10.1016/j.ijmedinf.2016.09.014DOI Listing

Publication Analysis

Top Keywords

machine learning
20
t2dm subjects
12
subjects t2dm
12
framework
8
type diabetes
8
electronic health
8
health records
8
association study
8
recall rate
8
identifying subjects
8

Similar Publications

Purpose: This study aimed to initially test whether machine learning approaches could categorically predict two simple biological features, mouse age and mouse species, using the retinal segmentation metrics.

Methods: The retinal layer thickness data obtained from C57BL/6 and DBA/2J mice were processed for machine learning after segmenting mouse retinal SD-OCT scans. Twenty-two models were trained to predict the mouse groups.

View Article and Find Full Text PDF

Evaluating the Immunogenicity Risk of Protein Therapeutics by Augmenting T Cell Epitope Prediction with Clinical Factors.

AAPS J

January 2025

Department of BioAnalytical Sciences, Genentech Inc, South San Francisco, California, USA.

Protein-based therapeutics may elicit undesired immune responses in a subset of patients, leading to the production of anti-drug antibodies (ADA). In some cases, ADAs have been reported to affect the pharmacokinetics, efficacy and/or safety of the drug. Accurate prediction of the ADA response can help drug developers identify the immunogenicity risk of the drug candidates, thereby allowing them to make the necessary modifications to mitigate the immunogenicity.

View Article and Find Full Text PDF

Currently, the World Health Organization (WHO) grade of meningiomas is determined based on the biopsy results. Therefore, accurate non-invasive preoperative grading could significantly improve treatment planning and patient outcomes. Considering recent advances in machine learning (ML) and deep learning (DL), this meta-analysis aimed to evaluate the performance of these models in predicting the WHO meningioma grade using imaging data.

View Article and Find Full Text PDF

The aesthetic understanding has found its place in dental clinics and prosthetic dental treatment. Determining the appropriate prosthetic tooth color between the clinician, patient and technician is a difficult process due to metamerism. Metamerism, known as the different perception of the color of an object under different light sources, is caused by the lighting differences between the laboratory and the dental clinic.

View Article and Find Full Text PDF

Background: Cyanobacteria, particularly Synechocystis sp. PCC 6803, serve as model organisms for studying acclimation strategies that enable adaptation to various environmental stresses. Understanding the molecular mechanisms underlying these adaptations provides insight into how cells adjust gene expression in response to challenging conditions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!