AI Article Synopsis

  • Clustering algorithms are critical in analyzing microarray data, helping identify co-regulated genes and stratify patients based on gene expression profiles, but traditional methods lack focus on patient distance measures.
  • A new clustering algorithm combines gene selection with functional annotation data to create biologically meaningful groupings, allowing for diverse distance metrics that yield different clusterings based on gene sets.
  • This method has successfully identified clinically relevant patient subgroups and could uncover previously unknown classes of patients, enhancing unsupervised classification in clinical studies.

Article Abstract

Motivation: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results.

Results: We describe a new clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples by combining expression profiles and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, leading to different clusterings. These clusterings are filtered using a resampling-based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition.

Conclusions: Our method reports clusterings defined by biologically focused sets of genes. In annotation-driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes as well as new subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner.

Availability: We provide the R package adSplit as part of Bioconductor release 1.9 and on http://compdiag.molgen.mpg.de/software.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btm322DOI Listing

Publication Analysis

Top Keywords

expression profiles
8
distance measure
8
measure patients
8
gene sets
8
sets genes
8
clinically relevant
8
clusterings
6
gene
5
annotation-based distance
4
distance measures
4

Similar Publications

Cystic Fibrosis (CF) is a life-threatening hereditary disease resulting from mutations in the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) gene that encodes a chloride channel essential for ion transport in epithelial cells. Mutations in CFTR, notably the prevalent F508del mutation, impair chloride transport, severely affecting the respiratory system and leading to recurrent infections. Recent therapeutic advancements include CFTR modulators such as ETI, a combination of two correctors (Elexacaftor and Tezacaftor) and a potentiator (Ivacaftor), that can improve CFTR function in patients with the F508del mutation.

View Article and Find Full Text PDF

Designing dual-targeted nanomedicines to enhance tumor delivery efficacy is a complex challenge, largely due to the barrier posed by blood vessels during systemic delivery. Effective transport across endothelial cells is, therefore, a critical topic of study. Herein, we present a synthetic biology-based approach to engineer dual-targeted ferritin nanocages (Dt-FTn) for understanding receptor-mediated transport across tumor endothelial cells.

View Article and Find Full Text PDF

The nucleolus is a major subnuclear compartment where ribosomal DNA (rDNA) is transcribed and ribosomes are assembled. In addition, recent studies have shown that the nucleolus is a dynamic organizer of chromatin architecture that modulates developmental gene expression. rDNA gene units are assembled into arrays located in the p-arms of five human acrocentric chromosomes.

View Article and Find Full Text PDF

The expansion of single-cell analytical techniques has empowered the exploration of diverse biological questions at the individual cells. Droplet-based single-cell RNA sequencing (scRNA-seq) methods have been particularly widely used due to their high-throughput capabilities and small reaction volumes. While commercial systems have contributed to the widespread adoption of droplet-based scRNA-seq, their relatively high cost limits the ability to profile large numbers of cells and samples.

View Article and Find Full Text PDF

Background: Non-small cell lung cancer (NSCLC) is a fatal disease, and radioresistance is an important factor leading to treatment failure and disease progression. The objective of this research was to detect radioresistance-related genes (RRRGs) with prognostic value in NSCLC.

Methods: The weighted gene coexpression network analysis (WGCNA) and differentially expressed genes (DEGs) analysis were performed to identify RRRGs using expression profiles from TCGA and GEO databases.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!