Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations.

Genet Epidemiol

Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, USA.

Published: January 2025

Transcriptome-wide association studies (TWAS) aim to uncover genotype-phenotype relationships through a two-stage procedure: predicting gene expression from genotypes using an expression quantitative trait locus (eQTL) data set, then testing the predicted expression for trait associations. Accurate gene expression prediction in stage 1 is crucial, as it directly impacts the power to identify associations in stage 2. Currently, the first stage of such studies is primarily conducted using linear models like elastic net regression, which fail to capture the nonlinear relationships inherent in biological systems. Deep learning methods have the potential to model such nonlinear effects, but have yet to demonstrably outperform linear methods at this task. To address this gap, we propose a new deep learning architecture to predict gene expression from genotypic variation across individuals. Our method utilizes a learnable input scaling layer in conjunction with a convolutional encoder to capture nonlinear effects and higher-order interactions without compromising on interpretability. We further augment this approach to allow for parameter sharing across multiple networks, enabling us to utilize prior information for individual variants in the form of functional annotations. Evaluations on real-world genomic data show that our method consistently outperforms elastic net regression across a large set of heritable genes. Furthermore, our model statistically significantly improved predictive performance by leveraging functional annotations, whereas elastic net regression failed to show equivalent gains when using the same information, suggesting that our method can capture nonlinear functional information beyond the capability of linear models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656135PMC
http://dx.doi.org/10.1002/gepi.22595DOI Listing

Publication Analysis

Top Keywords

gene expression
16
deep learning
12
functional annotations
12
elastic net
12
net regression
12
capture nonlinear
12
linear models
8
nonlinear effects
8
expression
6
enhancing gene
4

Similar Publications

Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches.

View Article and Find Full Text PDF

scMMAE: masked cross-attention network for single-cell multimodal omics fusion to enhance unimodal omics.

Brief Bioinform

November 2024

Guangdong Provincial Key Laboratory of Mathematical and Neural Dynamical Systems, Great Bay University, No. 16 Daxue Rd, Songshanhu District, Dongguan, Guangdong, 523000, China.

Multimodal omics provide deeper insight into the biological processes and cellular functions, especially transcriptomics and proteomics. Computational methods have been proposed for the integration of single-cell multimodal omics of transcriptomics and proteomics. However, existing methods primarily concentrate on the alignment of different omics, overlooking the unique information inherent in each omics type.

View Article and Find Full Text PDF

Background: Protein-truncating mutations in the titin gene are associated with increased risk of atrial fibrillation. However, little is known about the underlying pathophysiology.

Methods: We identified a heterozygous titin truncating variant (TTNtv) in a patient with unexplained early onset atrial fibrillation and normal ventricular function.

View Article and Find Full Text PDF

Adeno-associated viral (AAV) vectors are increasingly used for preclinical and clinical cardiac gene therapy approaches. However, gene transfer to cardiomyocytes poses a challenge due to differences between AAV serotypes in terms of expression efficiency and . For example, AAV9 vectors work well in rodent heart muscle cells but not in cultivated neonatal rat ventricular cardiomyocytes (NRVCMs), necessitating the use of AAV6 vectors for studies.

View Article and Find Full Text PDF

Although not essential for their growth, the production of secondary metabolites increases the fitness of the producing microorganisms in their natural habitat by enhancing establishment, competition, and nutrient acquisition. The Gram-positive soil-dwelling bacterium, , produces a variety of secondary metabolites. Here, we investigated the regulatory relationship between the non-ribosomal peptide surfactin and the sactipeptide bacteriocin subtilosin A.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!