Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.

Kin-On Cheng Ngai-Fong Law Wan-Chi Siu Alan Wee-Chung Liew

BMC Bioinformatics

School of Information and Communication Technology, Griffith University, Gold Coast Campus, QLD 4222, Queensland, Australia.

Published: April 2008

Background: The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only 1. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis.

Results: We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm.

Conclusion: We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2396181	PMC
http://dx.doi.org/10.1186/1471-2105-9-210	DOI Listing

Publication Analysis

Top Keywords

biclustering algorithm

algorithm

biclustering

gene expression

expression data

efficient biclustering

parallel coordinate

biclustering algorithms

pcluster algorithm

proposed algorithm

Similar Publications

Online-adjusted evolutionary biclustering algorithm to identify significant modules in gene expression data.

Brief Bioinform

November 2024

Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Circuito Escolar, Ciudad Universitaria, 04510 Mexico city, México.

Raúl Galindo-Hernández Katya Rodríguez-Vázquez Edgardo Galán-Vásquez Carlos Ignacio Hernández Castellanos

Analyzing gene expression data helps the identification of significant biological relationships in genes. With a growing number of open biological datasets available, it is paramount to use reliable and innovative methods to perform in-depth analyses of biological data and ensure that informed decisions are made based on accurate information. Evolutionary algorithms have been successful in the analysis of biological datasets.

View Article and Find Full Text PDF

Similar Publications

Clinical and Multiomic Features Differentiate Young Black and White Breast Cancer Cohorts Derived by Machine Learning Approaches.

Clin Breast Cancer

November 2024

Massachusetts College of Pharmacy and Health Sciences, Worcester, Massachusetts. Electronic address:

Kawther Abdilleh Boris Aguilar George Acquaah-Mensah

Background: There are documented differences in Breast cancer (BrCA) presentations and outcomes between Black and White patients. In addition to molecular factors, socioeconomic, racial, and clinical factors result in disparities in outcomes for women in the United States. Using machine learning and unsupervised biclustering methods within a multiomics framework, here we sought to shed light on the biological and clinical underpinnings of observed differences between Black and White BrCA patients.

View Article and Find Full Text PDF

Similar Publications

funBIalign: a hierachical algorithm for functional motif discovery based on mean squared residue scores.

Stat Comput

December 2024

Department of Statistics, Penn State University, Joab L. Thomas Building, University Park, 16802 PA USA.

Jacopo Di Iorio Marzia A Cremona Francesca Chiaromonte

Unlabelled: Motif discovery is gaining increasing attention in the domain of functional data analysis. Functional motifs are typical "shapes" or "patterns" that recur multiple times in different portions of a single curve and/or in misaligned portions of multiple curves. In this paper, we define functional motifs using an additive model and we propose for their discovery and evaluation.

View Article and Find Full Text PDF

Similar Publications

Uncovering hidden gene-trait patterns through biclustering analysis of the UK Biobank.

bioRxiv

November 2024

Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

Milton Pividori Suraju Sadeeq Arjun Krishnan Barbara E Stranger Christopher R Gignoux

The growing availability of genome-wide association studies (GWAS) and large-scale biobanks provides an unprecedented opportunity to explore the genetic basis of complex traits and diseases. However, with this vast amount of data comes the challenge of interpreting numerous associations across thousands of traits, especially given the high polygenicity and pleiotropy underlying complex phenotypes. Traditional clustering methods, which identify global patterns in data, lack the resolution to capture overlapping associations relevant to subsets of traits or genes.

View Article and Find Full Text PDF

Similar Publications

Imaging-genomic spatial-modality attentive fusion for studying neuropsychiatric disorders.

Hum Brain Mapp

December 2024

Georgia Institute of Technology, Atlanta, Georgia, USA.

Md Abdur Rahaman Yash Garg Armin Iraji Zening Fu Peter Kochunov

Multimodal learning has emerged as a powerful technique that leverages diverse data sources to enhance learning and decision-making processes. Adapting this approach to analyzing data collected from different biological domains is intuitive, especially for studying neuropsychiatric disorders. A complex neuropsychiatric disorder like schizophrenia (SZ) can affect multiple aspects of the brain and biologies.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!