Biclustering data analysis: a comprehensive survey.

Brief Bioinform

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal.

Published: May 2024

Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247412PMC
http://dx.doi.org/10.1093/bib/bbae342DOI Listing

Publication Analysis

Top Keywords

biclustering data
16
data analysis
16
biclustering
10
data
8
analysis
5
analysis comprehensive
4
comprehensive survey
4
survey biclustering
4
biclustering simultaneous
4
simultaneous clustering
4

Similar Publications

Background: Acute spinal cord injury causes severe motor and sensory dysfunction, significantly burdening individuals and society. This study uses bibliometric analysis to identify research trends and key areas, providing insights for future advancements in treatment.

Methods: Scientific publications on acute spinal cord injury were collected from PubMed and the Web of Science Core Collection (WoSCC) between 2000 and 2022.

View Article and Find Full Text PDF

Online-adjusted evolutionary biclustering algorithm to identify significant modules in gene expression data.

Brief Bioinform

November 2024

Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Circuito Escolar, Ciudad Universitaria, 04510 Mexico city, México.

Analyzing gene expression data helps the identification of significant biological relationships in genes. With a growing number of open biological datasets available, it is paramount to use reliable and innovative methods to perform in-depth analyses of biological data and ensure that informed decisions are made based on accurate information. Evolutionary algorithms have been successful in the analysis of biological datasets.

View Article and Find Full Text PDF

Background: There are documented differences in Breast cancer (BrCA) presentations and outcomes between Black and White patients. In addition to molecular factors, socioeconomic, racial, and clinical factors result in disparities in outcomes for women in the United States. Using machine learning and unsupervised biclustering methods within a multiomics framework, here we sought to shed light on the biological and clinical underpinnings of observed differences between Black and White BrCA patients.

View Article and Find Full Text PDF

funBIalign: a hierachical algorithm for functional motif discovery based on mean squared residue scores.

Stat Comput

December 2024

Department of Statistics, Penn State University, Joab L. Thomas Building, University Park, 16802 PA USA.

Unlabelled: Motif discovery is gaining increasing attention in the domain of functional data analysis. Functional motifs are typical "shapes" or "patterns" that recur multiple times in different portions of a single curve and/or in misaligned portions of multiple curves. In this paper, we define functional motifs using an additive model and we propose for their discovery and evaluation.

View Article and Find Full Text PDF

Uncovering hidden gene-trait patterns through biclustering analysis of the UK Biobank.

bioRxiv

November 2024

Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

The growing availability of genome-wide association studies (GWAS) and large-scale biobanks provides an unprecedented opportunity to explore the genetic basis of complex traits and diseases. However, with this vast amount of data comes the challenge of interpreting numerous associations across thousands of traits, especially given the high polygenicity and pleiotropy underlying complex phenotypes. Traditional clustering methods, which identify global patterns in data, lack the resolution to capture overlapping associations relevant to subsets of traits or genes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!