Clustering datasets with demographics and diagnosis codes.

J Biomed Inform

School of Computer Science, Cardiff University, Cardiff, UK. Electronic address:

Published: February 2020

Clustering data derived from Electronic Health Record (EHR) systems is important to discover relationships between the clinical profiles of patients and as a preprocessing step for analysis tasks, such as classification. However, the heterogeneity of these data makes the application of existing clustering methods difficult and calls for new clustering approaches. In this paper, we propose the first approach for clustering a dataset in which each record contains a patient's values in demographic attributes and their set of diagnosis codes. Our approach represents the dataset in a binary form in which the features are selected demographic values, as well as combinations (patterns) of frequent and correlated diagnosis codes. This representation enables measuring similarity between records using cosine similarity, an effective measure for binary-represented data, and finding compact, well-separated clusters through hierarchical clustering. Our experiments using two publicly available EHR datasets, comprised of over 26,000 and 52,000 records, demonstrate that our approach is able to construct clusters with correlated demographics and diagnosis codes, and that it is efficient and scalable.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2019.103360DOI Listing

Publication Analysis

Top Keywords

diagnosis codes
16
demographics diagnosis
8
clustering
6
clustering datasets
4
datasets demographics
4
diagnosis
4
codes
4
codes clustering
4
clustering data
4
data derived
4

Similar Publications

Background: The prognosis for patients with several types of cancer has substantially improved following the introduction of immune checkpoint inhibitors, a novel type of immunotherapy. However, patients may experience symptoms both from the cancer itself and from the medication. A prototype of the eHealth tool Cancer Patients Better Life Experience (CAPABLE) was developed to facilitate symptom management, aimed at patients with melanoma and renal cell carcinoma treated with immunotherapy.

View Article and Find Full Text PDF

Objective: Systemic lupus erythematosus (SLE) and Sjögren disease (SjD) are autoimmune diseases with significant female predominance. The prevalence of SLE is increased in Klinefelter syndrome (KS) compared with the general male population. Our study investigates the dose effects of extra X chromosomes on the development of SLE and SjD in KS and triple X syndrome compared with the general population.

View Article and Find Full Text PDF

Background: Verbal autopsy (VA) has been a crucial tool in ascertaining population-level cause of death (COD) estimates, specifically in countries where medical certification of COD is relatively limited. The World Health Organization has released an updated instrument (Verbal Autopsy Instrument 2022) that supports electronic data collection methods along with analytical software for assigning COD. This questionnaire encompasses the primary signs and symptoms associated with prevalent diseases across all age groups.

View Article and Find Full Text PDF

Use of artificial intelligence to predict outcomes in mild aortic valve stenosis.

Eur Heart J Digit Health

January 2025

Department of Cardiovascular Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA.

Aims: Aortic stenosis (AS) is a common and progressive disease, which, if left untreated, results in increased morbidity and mortality. Monitoring and follow-up care can be challenging due to significant variability in disease progression. This study aimed to develop machine learning models to predict the risks of disease progression and mortality in patients with mild AS.

View Article and Find Full Text PDF

Background: Prostate cancer (PC) is the most frequently diagnosed cancer in men and continues to be a major cause of cancer-related mortality worldwide. In recent years, non-coding RNAs (ncRNAs) have emerged as a significant focus in molecular biology research, playing a pivotal role in the development and progression of PC. This study employed bibliometric analysis to explore the global outputs, research hotspots, and future trends in ncRNA-related PC research over the past 20 years.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!