Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of 80,000 patients from the CALIBER programme, we compared traditional modelling and machine-learning approaches in EHR. First, we used Cox models and random survival forests with and without imputation on 27 expert-selected, preprocessed variables to predict all-cause mortality. We then used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118376PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0202344PLOS

Publication Analysis

Top Keywords

missing values
12
cox models
12
models random
12
models
11
variables
9
electronic health
8
health records
8
outperform conventional
8
clinical practice
8
imputing missing
8

Similar Publications

The aim of this study is to determine the mediating role of moral sensitivity in the effect of nurses' professional values on missed nursing care. A descriptive and correlational study was conducted with 640 nurses working in the inpatient units of a public and a private hospital with the MISSCARE Survey-Turkish, the Moral Sensitivity Questionnaire, and the Revised Nursing Professional Values Scale. Data analyses were performed using the Statistical Package for Social Sciences 26.

View Article and Find Full Text PDF

Developing a decision support tool to predict delayed discharge from hospitals using machine learning.

BMC Health Serv Res

January 2025

Department of Industrial Engineering, Dalhousie University, PO Box 15000, Halifax, B3H 4R2, NS, Canada.

Background: The growing demand for healthcare services challenges patient flow management in health systems. Alternative Level of Care (ALC) patients who no longer need acute care yet face discharge barriers contribute to prolonged stays and hospital overcrowding. Predicting these patients at admission allows for better resource planning, reducing bottlenecks, and improving flow.

View Article and Find Full Text PDF

Enhancing newborn screening sensitivity and specificity for missed NICCD using selected amino acids and acylcarnitines.

Orphanet J Rare Dis

January 2025

Department of Genetics and Metabolism, Children's Hospital of Zhejiang University School of Medicine, National Clinical Research Center for Child Health, No. 3333 Binsheng Road, Binjiang District, Hangzhou, 310053, Zhejiang, China.

Purpose: To enhance the detection rate of Neonatal Intrahepatic Cholestasis caused by Citrin Deficiency (NICCD) through newborn screening (NBS), we analyzed the metabolic profiles of missed patients and proposed a more reliable method for early diagnosis.

Methods: In this retrospective study, NICCD patients were classified into "Newborn Screening" (64 individuals) and "Missed Screening" (52 individuals) groups. Metabolic profiles were analyzed using the non-derivatized MS/MS Kit, and genetic mutations were identified via next-generation sequencing and confirmed by Sanger sequencing.

View Article and Find Full Text PDF

Background: Tuberculosis (TB) is a global problem that seriously jeopardizes human health. Among them, the diagnosis and treatment of smear- or culture-negative TB patients is a challenge. The Xpert MTB/RIF (Xpert) assay has been reported to be a novel molecular diagnostic tool for rapidly detecting TB.

View Article and Find Full Text PDF

Introduction: The United States Preventive Services Task Force (USPSTF) recommendation for cervical cancer screening includes the option to screen with high-risk human papilloma virus (hrHPV) alone, but some studies have reported that hrHPV testing alone missed precancerous and cancerous lesions. In this study, we evaluated the test performance characteristics of hrHPV in detecting cervical dysplasia with cervical cytology and biopsy as comparators.

Materials And Methods: We conducted a retrospective analysis of Papanicolaou smears between January and December 2019 performed at our institution with concurrent hrHPV and cytology testing.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!