A model-based clustering algorithm with covariates adjustment and its application to lung cancer stratification.

J Bioinform Comput Biol

Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão 1010 São Paulo, São Paulo 05508-090, Brazil.

Published: August 2023

Usually, the clustering process is the first step in several data analyses. Clustering allows identify patterns we did not note before and helps raise new hypotheses. However, one challenge when analyzing empirical data is the presence of covariates, which may mask the obtained clustering structure. For example, suppose we are interested in clustering a set of individuals into controls and cancer patients. A clustering algorithm could group subjects into young and elderly in this case. It may happen because the age at diagnosis is associated with cancer. Thus, we developed CEM-Co, a model-based clustering algorithm that removes/minimizes undesirable covariates' effects during the clustering process. We applied CEM-Co on a gene expression dataset composed of 129 stage I non-small cell lung cancer patients. As a result, we identified a subgroup with a poorer prognosis, while standard clustering algorithms failed.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720023500191DOI Listing

Publication Analysis

Top Keywords

clustering algorithm
12
model-based clustering
8
lung cancer
8
clustering
8
clustering process
8
cancer patients
8
algorithm covariates
4
covariates adjustment
4
adjustment application
4
application lung
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!