Motivation: The recent development of sequencing technologies revolutionized our understanding of the inner workings of the cell as well as the way disease is treated. A single RNA sequencing (RNA-Seq) experiment, however, measures tens of thousands of parameters simultaneously. While the results are information rich, data analysis provides a challenge. Dimensionality reduction methods help with this task by extracting patterns from the data by compressing it into compact vector representations.

Results: We present the factorized embeddings (FE) model, a self-supervised deep learning algorithm that learns simultaneously, by tensor factorization, gene and sample representation spaces. We ran the model on RNA-Seq data from two large-scale cohorts and observed that the sample representation captures information on single gene and global gene expression patterns. Moreover, we found that the gene representation space was organized such that tissue-specific genes, highly correlated genes as well as genes participating in the same GO terms were grouped. Finally, we compared the vector representation of samples learned by the FE model to other similar models on 49 regression tasks. We report that the representations trained with FE rank first or second in all of the tasks, surpassing, sometimes by a considerable margin, other representations.

Availability And Implementation: A toy example in the form of a Jupyter Notebook as well as the code and trained embeddings for this project can be found at: https://github.com/TrofimovAssya/FactorizedEmbeddings.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355243PMC
http://dx.doi.org/10.1093/bioinformatics/btaa488DOI Listing

Publication Analysis

Top Keywords

factorized embeddings
8
sample representation
8
embeddings learns
4
learns rich
4
rich biologically
4
biologically meaningful
4
meaningful embedding
4
embedding spaces
4
spaces factorized
4
factorized tensor
4

Similar Publications

Background: Patient experience data from social media offer patient-centered perspectives on disease, treatments, and health service delivery. Current guidelines typically rely on systematic reviews, while qualitative health studies are often seen as anecdotal and nongeneralizable. This study explores combining personal health experiences from multiple sources to create generalizable evidence.

View Article and Find Full Text PDF
Article Synopsis
  • Genetic variation linked to complex traits is highly pleiotropic, meaning it affects multiple traits, which can be better understood through multi-phenotype analyses to identify shared and specific genetic factors.
  • Traditional matrix factorization (MF) methods struggle with issues like sample-sharing confounding and often yield factors too broad to map onto biological pathways, prompting a need for improvement.
  • The newly introduced method GLEANR effectively addresses these challenges by detecting sparse genetic factors from GWAS summary statistics, improves the replication of genetic factors across different studies, and offers clearer interpretations aligned with diseases and biological processes, as demonstrated through its evaluation of the UK Biobank.
View Article and Find Full Text PDF

Text embedding plays a crucial role in natural language processing (NLP). Among various approaches, nonnegative matrix factorization (NMF) is an effective method for this purpose. However, the standard NMF approach, fundamentally based on the bag-of-words model, fails to utilize the contextual information of documents and may result in a significant loss of semantics.

View Article and Find Full Text PDF
Article Synopsis
  • - The review evaluates how network analysis methods apply to traditional Chinese medicine (TCM), including its medicinal substances, compatibility theories, and syndromes.
  • - Researchers collected literature from various databases and categorized studies based on their research methods, focusing on constructing biological networks and analyzing their characteristics.
  • - The findings suggest that network analysis techniques can significantly enhance understanding of TCM's complex systems, facilitating its modernization and internationalization while supporting personalized treatment and scientific research.
View Article and Find Full Text PDF

Fast polypharmacy side effect prediction using tensor factorization.

Bioinformatics

November 2024

MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, United Kingdom.

Motivation: Adverse reactions from drug combinations are increasingly common, making their accurate prediction a crucial challenge in modern medicine. Laboratory-based identification of these reactions is insufficient due to the combinatorial nature of the problem. While many computational approaches have been proposed, tensor factorization (TF) models have shown mixed results, necessitating a thorough investigation of their capabilities when properly optimized.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!