Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods.

PLoS One

Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA.

Published: September 2010

Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-to-event phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2941449PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0012693PLOS

Publication Analysis

Top Keywords

gene set
16
analysis expression
8
expression data
8
self-contained methods
8
methods investigated
8
number genes
8
genes gene
8
methods
7
gene
5
set
5

Similar Publications

Background: Lung adenocarcinoma is one of the most common malignant tumors worldwide. Its complex molecular mechanisms and high tumor heterogeneity pose significant challenges for clinical treatment. The manganese ion metabolism family plays a crucial role in various biological processes, and the abnormal expression of the NUDT3 gene in multiple cancers has drawn considerable attention.

View Article and Find Full Text PDF

Average nucleotide identity (ANI) is a widely used metric to estimate genetic relatedness, especially in microbial species delineation. While ANI calculation has been well optimized for bacteria and closely related viral genomes, accurate estimation of ANI below 80%, particularly in large reference data sets, has been challenging due to a lack of accurate and scalable methods. To bridge this gap, we introduce MANIAC, an efficient computational pipeline optimized for estimating ANI and alignment fraction (AF) in viral genomes with divergence around ANI of 70%.

View Article and Find Full Text PDF

Single-Cell Proteomics Uncovers Dual Traits of Dermal Sheath Cells in Wound Repair.

Adv Wound Care (New Rochelle)

January 2025

Translational Medicine Center, Baotou Central Hospital (Baotou Clinical Medical College, Affiliated to Inner Mongolia Medical University), Baotou, China.

Wound healing is a dynamic process involving multiple cell types and signaling pathways. Dermal sheath cells (DSCs), residing surrounding hair follicles, play a critical role in tissue repair, yet their regulatory mechanisms remain unclear. This study used single-cell proteomics with the mouse model to explore DSC function across different healing stages.

View Article and Find Full Text PDF

Molecular and functional convergences associated with complex multicellularity in Eukarya.

Mol Biol Evol

January 2025

Laboratório de Algoritmos em Biologia, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Brazil.

A key trait of Eukarya is the independent evolution of complex multicellular (CM) in animals, plants, fungi, brown algae and red algae. This phenotype is characterized by the initial exaptation of cell-cell adhesion genes followed by the emergence of mechanisms for cell-cell communication, together with the expansion of transcription factor gene families responsible for cell and tissue identity. The number of cell types (NCT) is commonly used as a quantitative proxy for biological complexity in comparative genomics studies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!