Background: Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data.
Results: Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr.
Conclusions: We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700673 | PMC |
http://dx.doi.org/10.1186/s12859-017-1905-4 | DOI Listing |
BMC Bioinformatics
November 2017
Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Background: Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data.
View Article and Find Full Text PDFMaintenance of periodontal health or transition to a periodontal lesion reflects the continuous and ongoing battle between the vast microbial ecology in the oral cavity and the array of resident and emigrating inflammatory/immune cells in the periodontium. This war clearly signifies many 'battlefronts' representing the interface of the mucosal-surface cells with the dynamic biofilms composed of commensal and potential pathogenic species, as well as more recent knowledge demonstrating active invasion of cells and tissues of the periodontium leading to skirmishes in connective tissue, the locality of bone and even in the local vasculature. Research in the discipline has uncovered a concerted effort of the microbiome, using an array of survival strategies, to interact with other bacteria and host cells.
View Article and Find Full Text PDFCell Syst
July 2017
Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA. Electronic address:
Cross-experiment comparisons in public data compendia are challenged by unmatched conditions and technical noise. The ADAGE method, which performs unsupervised integration with denoising autoencoder neural networks, can identify biological patterns, but because ADAGE models, like many neural networks, are over-parameterized, different ADAGE models perform equally well. To enhance model robustness and better build signatures consistent with biological pathways, we developed an ensemble ADAGE (eADAGE) that integrated stable signatures across models.
View Article and Find Full Text PDFExpert Rev Proteomics
February 2013
ProtAffin Biotechnologie AG, Reininghausstrasse 13a, 8020 Graz, Austria.
Biological functions of a variety of proteins are mediated via their interaction with glycosaminoglycans (GAGs). The structural diversity within the wide GAG landscape provides individual interaction sites for a multitude of proteins involved in several pathophysiological processes. This 'GAG angle' of such proteins as well as their specific GAG ligands give rise to novel therapeutic concepts for drug development.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!