GMMchi: gene expression clustering using Gaussian mixture modeling.

BMC Bioinformatics

Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, Department of Oncology, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK.

Published: November 2022

Background: Cancer evolution consists of a stepwise acquisition of genetic and epigenetic changes, which alter the gene expression profiles of cells in a particular tissue and result in phenotypic alterations acted upon by natural selection. The recurrent appearance of specific genetic lesions across individual cancers and cancer types suggests the existence of certain "driver mutations," which likely make up the major contribution to tumors' selective advantages over surrounding normal tissue and as such are responsible for the most consequential aspects of the cancer cells' gene expression patterns and phenotypes. We hypothesize that such mutations are likely to cluster with specific dichotomous shifts in the expression of the genes they most closely control, and propose GMMchi, a Python package that leverages Gaussian Mixture Modeling to detect and characterize bimodal gene expression patterns across cancer samples, as a tool to analyze such correlations using 2 × 2 contingency table statistics.

Results: Using well-defined simulated data, we were able to confirm the robust performance of GMMchi, reaching 85% accuracy with a sample size of n = 90. We were also able to demonstrate a few examples of the application of GMMchi with respect to its capacity to characterize background florescent signals in microarray data, filter out uninformative background probe sets, as well as uncover novel genetic interrelationships and tumor characteristics. Our approach to analysing gene expression analysis in cancers provides an additional lens to supplement traditional continuous-valued statistical analysis by maximizing the information that can be gathered from bulk gene expression data.

Conclusions: We confirm that GMMchi robustly and reliably extracts bimodal patterns from both colorectal cancer (CRC) cell line-derived microarray and tumor-derived RNA-Seq data and verify previously reported gene expression correlates of some well-characterized CRC phenotypes.

Availability: The Python package GMMchi and our cell line microarray data used in this paper is available for downloading on GitHub at https://github.com/jeffliu6068/GMMchi .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9632092PMC
http://dx.doi.org/10.1186/s12859-022-05006-0DOI Listing

Publication Analysis

Top Keywords

gene expression
28
expression
8
gaussian mixture
8
mixture modeling
8
expression patterns
8
python package
8
microarray data
8
gmmchi
6
gene
6
cancer
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!