RummaGEO: Automatic Mining of Human and Mouse Gene Sets from GEO.

bioRxiv

Mount Sinai Center for Bioinformatics, Department of Pharmacological Sciences, Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York 10029, NY USA.

Published: April 2024

The Gene Expression Omnibus (GEO) is a major open biomedical research repository for transcriptomics and other omics datasets. It currently contains millions of gene expression samples from tens of thousands of studies collected by many biomedical research laboratories from around the world. While users of the GEO repository can search the metadata describing studies for locating relevant datasets, there are currently no methods or resources that facilitate global search of GEO at the data level. To address this shortcoming, we developed RummaGEO, a webserver application that enables gene expression signature search of a large collection of human and mouse RNA-seq studies deposited into GEO. To develop the search engine, we performed offline automatic identification of sample conditions from the uniformly aligned GEO studies available from ARCHS4. We then computed differential expression signatures to extract gene sets from these studies. In total, RummaGEO currently contains 135,264 human and 158,062 mouse gene sets extracted from 23,395 GEO studies. Next, we analyzed the contents of the RummaGEO database to identify statistical patterns and perform various global analyses. The contents of the RummaGEO database are provided as a web-server search engine with signature search, PubMed search, and metadata search functionalities. Overall, RummaGEO provides an unprecedented resource for the biomedical research community enabling hypothesis generation for many future studies. The RummaGEO search engine is available from: https://rummageo.com/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11030343PMC
http://dx.doi.org/10.1101/2024.04.09.588712DOI Listing

Publication Analysis

Top Keywords

gene sets
12
gene expression
12
search engine
12
search
9
human mouse
8
mouse gene
8
datasets currently
8
search metadata
8
signature search
8
geo studies
8

Similar Publications

Background: The accurate and prompt diagnosis of infections is essential for improving patient outcomes and preventing bacterial drug resistance. Host gene expression profiling as an approach to infection diagnosis holds great potential in assisting early and accurate diagnosis of infection.

Methods: To improve the precision of infection diagnosis, we developed InfectDiagno, a rank-based ensemble machine learning algorithm for infection diagnosis via host gene expression patterns.

View Article and Find Full Text PDF

Background: Fibroblasts in the fibrotic heart exhibit a heterogeneous biological behavior. The specific subsets of fibroblasts that contribute to progressive cardiac fibrosis remain unrevealed. Our aim is to identify the heart fibroblast (FB) subsets that most significantly promote fibrosis and the related critical genes as biomarkers for ischemic heart disease.

View Article and Find Full Text PDF

Integrating pharmacogenomics and cheminformatics with diverse disease phenotypes for cell type-guided drug discovery.

Genome Med

January 2025

Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA, 02115, USA.

Background: Large-scale pharmacogenomic resources, such as the Connectivity Map (CMap), have greatly assisted computational drug discovery. However, despite their widespread use, CMap-based methods have thus far been agnostic to the biological activity of drugs as well as to the genomic effects of drugs in multiple disease contexts. Here, we present a network-based statistical approach, Pathopticon, that uses CMap to build cell type-specific gene-drug perturbation networks and integrates these networks with cheminformatic data and diverse disease phenotypes to prioritize drugs in a cell type-dependent manner.

View Article and Find Full Text PDF

Esophageal cancer is a grave malignant condition. While radiotherapy, often in conjunction with chemotherapy, serves as a cornerstone in the management of locally advanced or metastatic cases, patient tolerance and treatment resistance frequently hinder its efficacy. Cell-in-cell structures, prevalent in various tumors, have been linked to prognosis.

View Article and Find Full Text PDF

Single-cell RNA sequencing of the carotid artery and femoral artery of rats exposed to hindlimb unloading.

Cell Mol Life Sci

January 2025

Department of Aerospace Medical Training, School of Aerospace Medicine, Fourth Military Medical University, 169 Chang Le Xi Road, Xi'an, 710032, China.

Background: Prolonged spaceflight is known to cause vascular deconditioning and remodeling. Tail suspension, a widely used spaceflight analog, is reported to result in vascular remodeling of rats. However, little is known about the cellular atlas of the heterogeneous cells of CA and FA from hindlimb-unloaded rats.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!