A machine learning-based framework to identify type 2 diabetes through electronic health records.

Tao Zheng Wei Xie Liling Xu Xiaoying He Ya Zhang Mingrong You Gong Yang You Chen

Int J Med Inform

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Electronic address:

Published: January 2017

Objective: To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate.

Materials And Methods: We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014.

Results: We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC).

Discussion: Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature engineering to loosen such selection criteria to achieve a high identification rate of cases and controls.

Conclusions: Our proposed framework demonstrates a more accurate and efficient approach for identifying subjects with and without T2DM from EHR.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5144921	PMC
http://dx.doi.org/10.1016/j.ijmedinf.2016.09.014	DOI Listing

Publication Analysis

Top Keywords

machine learning

t2dm subjects

subjects t2dm

framework

type diabetes

electronic health

health records

association study

recall rate

identifying subjects

Similar Publications

Integrating Retinal Segmentation Metrics with Machine Learning for Predictions from Mouse SD-OCT Scans.

Curr Eye Res

January 2025

Department of Ophthalmology, Edward S. Harkness Eye Institute, Columbia University, Vagelos College of Physicians and Surgeons, New York, NY, USA.

Maide Gözde İnam Onur İnam Xiangjun Yang Qun Zeng Gülgün Tezel

Purpose: This study aimed to initially test whether machine learning approaches could categorically predict two simple biological features, mouse age and mouse species, using the retinal segmentation metrics.

Methods: The retinal layer thickness data obtained from C57BL/6 and DBA/2J mice were processed for machine learning after segmenting mouse retinal SD-OCT scans. Twenty-two models were trained to predict the mouse groups.

View Article and Find Full Text PDF

Similar Publications

Evaluating the Immunogenicity Risk of Protein Therapeutics by Augmenting T Cell Epitope Prediction with Clinical Factors.

AAPS J

January 2025

Department of BioAnalytical Sciences, Genentech Inc, South San Francisco, California, USA.

Zicheng Hu Patrick Wu Steven J Swanson

Protein-based therapeutics may elicit undesired immune responses in a subset of patients, leading to the production of anti-drug antibodies (ADA). In some cases, ADAs have been reported to affect the pharmacokinetics, efficacy and/or safety of the drug. Accurate prediction of the ADA response can help drug developers identify the immunogenicity risk of the drug candidates, thereby allowing them to make the necessary modifications to mitigate the immunogenicity.

View Article and Find Full Text PDF

Similar Publications

Performance of Radiomics-based machine learning and deep learning-based methods in the prediction of tumor grade in meningioma: a systematic review and meta-analysis.

Neurosurg Rev

January 2025

Department of Neurosurgery, Mount Sinai Hospital, Icahn School of Medicine, New York City, NY, USA.

Roozbeh Tavanaei Mohammadhosein Akhlaghpasand Alireza Alikhani Bardia Hajikarimloo Ali Ansari

Currently, the World Health Organization (WHO) grade of meningiomas is determined based on the biopsy results. Therefore, accurate non-invasive preoperative grading could significantly improve treatment planning and patient outcomes. Considering recent advances in machine learning (ML) and deep learning (DL), this meta-analysis aimed to evaluate the performance of these models in predicting the WHO meningioma grade using imaging data.

View Article and Find Full Text PDF

Similar Publications

ML-based tooth shade assessment to prevent metamerism in different clinic lights.

Lasers Med Sci

January 2025

Erzincan University, 24002, Erzincan, Turkey.

Abdullah Ammar Karcioglu Esra Efitli Emrah Simsek Alper Ozdogan Furkan Karatas

The aesthetic understanding has found its place in dental clinics and prosthetic dental treatment. Determining the appropriate prosthetic tooth color between the clinician, patient and technician is a difficult process due to metamerism. Metamerism, known as the different perception of the color of an object under different light sources, is caused by the lighting differences between the laboratory and the dental clinic.

View Article and Find Full Text PDF

Similar Publications

Integrative bioinformatics approaches reveal key hub genes in cyanobacteria: insights from Synechocystis sp. PCC 6803 and Geminocystis sp. NIES-3708 under abiotic stress conditions.

Genes Genomics

January 2025

Department of Molecular Biosciences, Wenner-Gren Institute, Stockholm University, 106 91, Stockholm, Sweden.

Abbas Karimi-Fard Abbas Saidi Masoud Tohidfar Seyede N Emami

Background: Cyanobacteria, particularly Synechocystis sp. PCC 6803, serve as model organisms for studying acclimation strategies that enable adaptation to various environmental stresses. Understanding the molecular mechanisms underlying these adaptations provides insight into how cells adjust gene expression in response to challenging conditions.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!