Optimal clustering with missing values.

Shahin Boluki Siamak Zamani Dadaneh Xiaoning Qian Edward R Dougherty

BMC Bioinformatics

Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA.

Published: June 2019

Background: Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data.

Results: We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches.

Conclusion: Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6584727	PMC
http://dx.doi.org/10.1186/s12859-019-2832-3	DOI Listing

Publication Analysis

Top Keywords

missing values

optimal clustering

clustering missing

missing

clustering

values

values context

random labeled

labeled point

point process

Similar Publications

Imputation for Lipidomics and Metabolomics (ImpLiMet): a web-based application for optimization and method selection for missing data imputation.

Bioinform Adv

January 2025

Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON K1K 4P7, Canada.

Huiting Ou Anuradha Surendra Graeme S V McDowell Emily Hashimoto-Roth Jianguo Xia

Motivation: Missing values are prevalent in high-throughput measurements due to various experimental or analytical reasons. Imputation, the process of replacing missing values in a dataset with estimated values, plays an important role in multivariate and machine learning analyses. The three missingness patterns, including missing completely at random, missing at random, and missing not at random, describe unique dependencies between the missing and observed data.

View Article and Find Full Text PDF

Similar Publications

Missing value replacement in strings and applications.

Data Min Knowl Discov

January 2025

CWI, Amsterdam, The Netherlands.

Giulia Bernardini Chang Liu Grigorios Loukides Alberto Marchetti-Spaccamela Solon P Pissis

Missing values arise routinely in real-world sequential (string) datasets due to: (1) imprecise data measurements; (2) flexible sequence modeling, such as binding profiles of molecular sequences; or (3) the existence of confidential information in a dataset which has been deleted deliberately for privacy protection. In order to analyze such datasets, it is often important to replace each missing value, with one or more letters, in an efficient and effective way. Here we formalize this task as a combinatorial optimization problem: the set of constraints includes the of the missing value (i.

View Article and Find Full Text PDF

Similar Publications

The Effect of Coronary Artery Disease on the Prognosis of Hypertrophic Cardiomyopathy: A Multi-Center Cohort Study.

Rev Cardiovasc Med

January 2025

Department of Cardiology, Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, 610072 Chengdu, Sichuan, China.

Guoqing Hou Qian Liao Huihui Ma Yan Shu Shengzhi Zeng

Background: There is a shortage of patients with hypertrophic cardiomyopathy (HCM) with concurrent coronary artery disease (CAD), and the influence of CAD on the prognosis of patients with HCM is uncertain. This real-world cohort study was conducted to evaluate the prognosis of patients with patients with CAD.

Methods: This cohort study of patients with HCM was conducted from May 2003 to September 2021.

View Article and Find Full Text PDF

Similar Publications

Correlation of Dental and Periodontal Status With HIV Presence and Initial CD4 Counts: An Albanian Prospective Observational Study.

Cureus

December 2024

Infectious Diseases, Faculty of Medicine, University of Medicine, Tirana, ALB.

Eriselda Simoni Malushi Leonard Simoni Laureta Flaga Arjan Harxhi Najada Como

Background Different pathologies are encountered more often in human immunodeficiency virus (HIV)-infected patients, such as bacterial, fungal, viral infection, and neoplastic diseases. Recently, studies have shown that HIV-infected individuals have poorer oral health outcomes, worse dentition, and aggressive forms of periodontitis. This study aims to investigate the dental and periodontal status of HIV-infected patients, the correlation between CD4+ level and the CD4 percentage with dentition, and periodontal status.

View Article and Find Full Text PDF

Similar Publications

PSMA PET/CT based multimodal deep learning model for accurate prediction of pelvic lymph-node metastases in prostate cancer patients identified as candidates for extended pelvic lymph node dissection by preoperative nomograms.

Eur J Nucl Med Mol Imaging

January 2025

Department of Nuclear Medicine, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Changsha, Hunan, 410008, P.R. China.

Qiaoke Ma Bei Chen Robert Seifert Rui Zhou Ling Xiao

Purpose: To develop and validate a prostate-specific membrane antigen (PSMA) PET/CT based multimodal deep learning model for predicting pathological lymph node invasion (LNI) in prostate cancer (PCa) patients identified as candidates for extended pelvic lymph node dissection (ePLND) by preoperative nomograms.

Methods: [Ga]Ga-PSMA-617 PET/CT scan of 116 eligible PCa patients (82 in the training cohort and 34 in the test cohort) who underwent radical prostatectomy with ePLND were analyzed in our study. The Med3D deep learning network was utilized to extract discriminative features from the entire prostate volume of interest on the PET/CT images.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!