MousiPLIER: A Mouse Pathway-Level Information Extractor Model.

bioRxiv

Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA.

Published: August 2023

AI Article Synopsis

  • - High throughput gene expression profiling helps researchers develop hypotheses about biological functions and diseases, but has limitations in inferring biological pathways and managing the testing of numerous genes.
  • - The study introduces the Pathway-level information extractor (PLIER), an unsupervised machine learning tool trained on a large dataset of 190,111 mouse brain RNA-sequencing samples, enhancing data interpretation by reducing dimensionality.
  • - The researchers applied mousiPLIER to analyze aging in mouse brain microglia and astrocytes, identifying significant latent variables linked to aging, and created a web server for easy access to these findings, demonstrating its potential to reveal important biological processes.

Article Abstract

High throughput gene expression profiling is a powerful approach to generate hypotheses on the underlying causes of biological function and disease. Yet this approach is limited by its ability to infer underlying biological pathways and burden of testing tens of thousands of individual genes. Machine learning models that incorporate prior biological knowledge are necessary to extract meaningful pathways and generate rational hypothesis from the vast amount of gene expression data generated to date. We adopted an unsupervised machine learning method, Pathway-level information extractor (PLIER), to train the first mouse PLIER model on 190,111 mouse brain RNA-sequencing samples, the greatest amount of training data ever used by PLIER. mousiPLER converted gene expression data into a latent variables that align to known pathway or cell maker gene sets, substantially reducing data dimensionality and improving interpretability. To determine the utility of mousiPLIER, we applied it to a mouse brain aging study of microglia and astrocyte transcriptomic profiling. We found a specific set of latent variables that are significantly associated with aging, including one latent variable (LV41) corresponding to striatal signal. We next performed k-means clustering on the training data to identify studies that respond strongly to LV41, finding that the variable is relevant to striatum and aging across the scientific literature. Finally, we built a web server (http://mousiplier.greenelab.com/) for users to easily explore the learned latent variables. Taken together this study provides proof of concept that mousiPLIER can uncover meaningful biological processes in mouse transcriptomic studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418102PMC
http://dx.doi.org/10.1101/2023.07.31.551386DOI Listing

Publication Analysis

Top Keywords

gene expression
12
latent variables
12
pathway-level extractor
8
underlying biological
8
machine learning
8
expression data
8
mouse brain
8
training data
8
data
5
mousiplier mouse
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!