Background: Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data.
Results: We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches.
Conclusion: Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6584727 | PMC |
http://dx.doi.org/10.1186/s12859-019-2832-3 | DOI Listing |
Bioinform Adv
January 2025
Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON K1K 4P7, Canada.
Motivation: Missing values are prevalent in high-throughput measurements due to various experimental or analytical reasons. Imputation, the process of replacing missing values in a dataset with estimated values, plays an important role in multivariate and machine learning analyses. The three missingness patterns, including missing completely at random, missing at random, and missing not at random, describe unique dependencies between the missing and observed data.
View Article and Find Full Text PDFData Min Knowl Discov
January 2025
CWI, Amsterdam, The Netherlands.
Missing values arise routinely in real-world sequential (string) datasets due to: (1) imprecise data measurements; (2) flexible sequence modeling, such as binding profiles of molecular sequences; or (3) the existence of confidential information in a dataset which has been deleted deliberately for privacy protection. In order to analyze such datasets, it is often important to replace each missing value, with one or more letters, in an efficient and effective way. Here we formalize this task as a combinatorial optimization problem: the set of constraints includes the of the missing value (i.
View Article and Find Full Text PDFRev Cardiovasc Med
January 2025
Department of Cardiology, Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, 610072 Chengdu, Sichuan, China.
Background: There is a shortage of patients with hypertrophic cardiomyopathy (HCM) with concurrent coronary artery disease (CAD), and the influence of CAD on the prognosis of patients with HCM is uncertain. This real-world cohort study was conducted to evaluate the prognosis of patients with patients with CAD.
Methods: This cohort study of patients with HCM was conducted from May 2003 to September 2021.
Cureus
December 2024
Infectious Diseases, Faculty of Medicine, University of Medicine, Tirana, ALB.
Background Different pathologies are encountered more often in human immunodeficiency virus (HIV)-infected patients, such as bacterial, fungal, viral infection, and neoplastic diseases. Recently, studies have shown that HIV-infected individuals have poorer oral health outcomes, worse dentition, and aggressive forms of periodontitis. This study aims to investigate the dental and periodontal status of HIV-infected patients, the correlation between CD4+ level and the CD4 percentage with dentition, and periodontal status.
View Article and Find Full Text PDFEur J Nucl Med Mol Imaging
January 2025
Department of Nuclear Medicine, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Changsha, Hunan, 410008, P.R. China.
Purpose: To develop and validate a prostate-specific membrane antigen (PSMA) PET/CT based multimodal deep learning model for predicting pathological lymph node invasion (LNI) in prostate cancer (PCa) patients identified as candidates for extended pelvic lymph node dissection (ePLND) by preoperative nomograms.
Methods: [Ga]Ga-PSMA-617 PET/CT scan of 116 eligible PCa patients (82 in the training cohort and 34 in the test cohort) who underwent radical prostatectomy with ePLND were analyzed in our study. The Med3D deep learning network was utilized to extract discriminative features from the entire prostate volume of interest on the PET/CT images.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!