Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks. We first experiment with the basic EM algorithm. When the modeling assumptions are violated, basic EM leads to inaccurate parameter estimation. Augmented EM attenuates this shortcoming by introducing a weighting factor that downweights the unlabeled data. Cross-validation does not always lead to the best setting of the weighting factor and other heuristic methods may be preferred. We show that accurate phenotyping models can be trained with only a few hundred labeled (and a large number of unlabeled) examples, potentially providing substantial savings in the amount of the required manual chart review.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765699PMC

Publication Analysis

Top Keywords

semi-supervised learning
12
phenotyping tasks
8
manual chart
8
chart review
8
unlabeled data
8
weighting factor
8
phenotyping
4
learning phenotyping
4
tasks supervised
4
supervised learning
4

Similar Publications

Placebo effect represents a serious confounder for the assessment of treatment effect to the extent that it has become increasingly difficult to develop antidepressant medications appropriate for outperforming placebo. Treatment effect in randomized, placebo-controlled trials, is usually estimated by the mean baseline adjusted difference of treatment response in active and placebo arms and is function of treatment-specific and non-specific effects. The non-specific treatment effect varies subject by subject conditional to the individual propensity to respond to placebo.

View Article and Find Full Text PDF

The hippocampus is a small, yet intricate seahorse-shaped tiny structure located deep within the brain's medial temporal lobe. It is a crucial component of the limbic system, which is responsible for regulating emotions, memory, and spatial navigation. This research focuses on automatic hippocampus segmentation from Magnetic Resonance (MR) images of a human head with high accuracy and fewer false positive and false negative rates.

View Article and Find Full Text PDF

Comparative analysis of regression algorithms for drug response prediction using GDSC dataset.

BMC Res Notes

January 2025

Department of Computer Engineering, Chungbuk National University, Chungdae-ro 1, Cheongju, 28644, Republic of Korea.

Background: Drug response prediction can infer the relationship between an individual's genetic profile and a drug, which can be used to determine the choice of treatment for an individual patient. Prediction of drug response is recently being performed using machine learning technology. However, high-throughput sequencing data produces thousands of features per patient.

View Article and Find Full Text PDF

Supervised machine learning statistical models for visual outcome prediction in macular hole surgery: a single-surgeon, standardized surgery study.

Int J Retina Vitreous

January 2025

Department of Retina and Vitreous, Narayana Nethralaya, #121/C, 1st R Block, Chord Road, Rajaji Nagar, Bengaluru, 560010, India.

Purpose: To evaluate the predictive accuracy of various machine learning (ML) statistical models in forecasting postoperative visual acuity (VA) outcomes following macular hole (MH) surgery using preoperative optical coherence tomography (OCT) parameters.

Methods: This retrospective study included 158 eyes (151 patients) with full-thickness MHs treated between 2017 and 2023 by the same surgeon and using the same intraoperative surgical technique. Data from electronic medical records and OCT scans were extracted, with OCT-derived qualitative and quantitative MH characteristics recorded.

View Article and Find Full Text PDF

In order to construct a clinical classification prediction model for hydrocephalus after intercerebral haemorrhage(ICH) to guide clinical treatment decisions, this paper retrospectively analyses the clinical data of 844 cases of ICH and hydrocephalus inpatients admitted to Yueyang People's Hospital from May 2019 to October 2022, of which 95 cases of hydrocephalus occurred after ICH and no hydrocephalus in 749 cases. The following indicators were compared between the two groups of patients: gender, age, Glasgow Coma Scale(GCS)score, whether the amount of bleeding was greater than 30 ml, whether it broke into the ventricle or not, modified Graeb score(MGS), modified Rankin Scale (MRS) score, whether surgery was performed or not, red blood cells, white blood cells, and platelets. After variable screening, the following six variables were selected: GCS score, MGS, MRS score, whether the bleeding volume was greater than 30 ml, whether it broke into the ventricle or not, and whether surgery was performed or not were modelled and analysed using logistic regression model and support vector machine model in machine learning.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!