Two subtle problems with overrepresentation analysis.

Bioinform Adv

Bioinformatics Working Group, Burnet Institute, Melbourne, VIC 3004, Australia.

Published: October 2024

AI Article Synopsis

  • ORA is a popular method for analyzing gene lists, but it has two main issues: the 'background problem' where genes not categorized are discarded, and the 'false discovery rate problem' where the number of tests is underestimated.
  • Researchers have demonstrated how these problems affect RNA-seq datasets, highlighting that their impact varies based on gene set libraries, list size, and dataset noise.
  • To address these issues, users can switch ORA tools or opt for methods like functional class scoring, and an R/Shiny tool has been developed for easier implementation.

Article Abstract

Motivation: Overrepresentation analysis (ORA) is used widely to assess the enrichment of functional categories in a gene list compared to a background list. ORA is therefore a critical method in the interpretation of 'omics data, relating gene lists to biological functions and themes. Although ORA is hugely popular, we and others have noticed two potentially undesired behaviours of some ORA tools. The first one we call the 'background problem', because it involves the software eliminating large numbers of genes from the background list if they are not annotated as belonging to any category. The second one we call the 'false discovery rate problem', because some tools underestimate the true number of parallel tests conducted.

Results: Here, we demonstrate the impact of these issues on several real RNA-seq datasets and use simulated RNA-seq data to quantify the impact of these problems. We show that the severity of these problems depends on the gene set library, the number of genes in the list, and the degree of noise in the dataset. These problems can be mitigated by changing packages/websites for ORA or by changing to another approach such as functional class scoring.

Availability And Implementation: An R/Shiny tool has been provided at https://oratool.ziemann-lab.net/ and the supporting materials are available from Zenodo (https://zenodo.org/records/13823301).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557902PMC
http://dx.doi.org/10.1093/bioadv/vbae159DOI Listing

Publication Analysis

Top Keywords

overrepresentation analysis
8
background list
8
ora
5
subtle problems
4
problems overrepresentation
4
analysis motivation
4
motivation overrepresentation
4
analysis ora
4
ora assess
4
assess enrichment
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!