GOAT: efficient and robust identification of gene set enrichment.

Commun Biol

Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University, 1081 HV, Amsterdam, The Netherlands.

Published: June 2024

Gene set enrichment analysis is foundational to the interpretation of high throughput biology. Identifying enriched Gene Ontology (GO) terms or disease-associated gene sets within a list of gene effect sizes that represent experimental outcomes is an everyday task in life science that crucially depends on robust and sensitive statistical tools. We here present GOAT, a parameter-free algorithm for gene set enrichment analysis of preranked gene lists. The algorithm can precompute null distributions from standardized gene scores, enabling enrichment testing of the GO database in one second. Validations using synthetic data show that estimated gene set p-values are well calibrated under the null hypothesis and invariant to gene list length and gene set size. Application to various real-world proteomics and gene expression studies demonstrates that GOAT identifies more significant GO terms as compared to current methods. GOAT is freely available as an R package and user-friendly online tool for gene set enrichment analyses that includes interactive data visualizations: https://ftwkoopmans.github.io/goat .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187187PMC
http://dx.doi.org/10.1038/s42003-024-06454-5DOI Listing

Publication Analysis

Top Keywords

gene set
24
set enrichment
16
gene
13
enrichment analysis
8
set
6
enrichment
5
goat
4
goat efficient
4
efficient robust
4
robust identification
4

Similar Publications

Large-scale gene-environment interaction (GxE) discovery efforts often involve analytical compromises for the sake of data harmonization and statistical power. Refinement of exposures, covariates, outcomes, and population subsets may be helpful to establish often-elusive replication and evaluate potential clinical utility. Here, we used additional datasets, an expanded set of statistical models, and interrogation of lipoprotein metabolism via nuclear magnetic resonance (NMR)-based lipoprotein subfractions to refine a previously discovered GxE modifying the relationship between physical activity (PA) and HDL-cholesterol (HDL-C).

View Article and Find Full Text PDF

SET domain bifurcated histone lysine methyltransferase 1 (SETDB1/ESET), a pivotal H3K9 methyltransferase, has been extensively studied since its discovery over two decades ago. SETDB1 plays critical roles in immune regulation, including B cell maturation, T-cell activity modulation, and endogenous retrovirus (ERV) silencing. While essential for normal immune cell function, SETDB1 overexpression in cancer cells disrupts immune responses by suppressing tumor immunogenicity and facilitating immune evasion.

View Article and Find Full Text PDF

Hibernation, an adaptive mechanism to extreme environmental conditions, is prevalent among mammals. Its main characteristics include reduced body temperature and metabolic rate. However, the mechanisms by which hibernating animals re-enter deep sleep during the euthermic phase to sustain hibernation remain poorly understood.

View Article and Find Full Text PDF

Work in many systems has shown large-scale changes in gene expression during aging. However, many studies employ just two, arbitrarily-chosen timepoints at which to measure expression, and can only observe an increase or a decrease in expression between "young" and "old" animals, failing to capture any dynamic, non-linear changes that occur throughout the aging process. We used RNA sequencing to measure expression in male head tissue at 15 timepoints through the lifespan of an inbred strain.

View Article and Find Full Text PDF

The advent of single-cell RNA sequencing (scRNA-seq) has greatly enhanced our ability to explore cellular heterogeneity with high resolution. Identifying subpopulations of cells and their associated molecular markers is crucial in understanding their distinct roles in tissues. To address the challenges in marker gene selection, we introduce CORTADO, a computational framework based on hill-climbing optimization for the efficient discovery of cell-type-specific markers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!