Artificial intelligence (AI)/deep learning (DL) models that predict molecular phenotypes like gene expression directly from DNA sequences have recently emerged. While these models have proven effective at capturing the variation across genes, their ability to explain inter-individual differences has been limited. We hypothesize that the performance gap can be narrowed through the use of pre-trained embeddings from the Nucleotide Transformer, a large foundation model trained on 3,000+ genomes. We train a transformer model using the pre-trained embeddings and compare its predictive performance to Enformer, the current state-of-the-art model, using genotype and expression data from 290 individuals. Our model significantly outperforms Enformer in terms of correlation across individuals, and narrows the performance gap with an elastic net regression approach that uses just the genetic variants as predictors. Although simple regression models have their advantages in personalized prediction tasks, DL approaches based on foundation models pre-trained on diverse genomes have unique strengths in flexibility and interpretability. With further methodological and computational improvements with more training data, these models may eventually predict molecular phenotypes from DNA sequences with an accuracy surpassing that of regression-based approaches. Our work demonstrates the potential for large pre-trained AI/DL models to advance functional genomics.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11416237 | PMC |
http://dx.doi.org/10.1016/j.xhgg.2024.100347 | DOI Listing |
Exp Appl Acarol
January 2025
Laboratorio de Vectores y Enfermedades Transmitidas, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay.
Babesia species (Piroplasmida) are hemoparasites that infect erythrocytes of mammals and birds and are mainly transmitted by hard ticks (Acari: Ixodidae). These hemoparasites are known to be the second most common parasites infecting mammals, after trypanosomes, and some species may cause malaria-like disease in humans. Diagnosis and understanding of Babesia diversity increasingly rely on genetic data obtained through molecular techniques.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Life Sciences, Imperial College, London, SW7 2AZ, UK.
Many cellular patterns exhibit a reaction-diffusion component, suggesting that Turing instability may contribute to pattern formation. However, biological gene-regulatory pathways are more complex than simple Turing activator-inhibitor models and generally do not require fine-tuning of parameters as dictated by the Turing conditions. To address these issues, we employ random matrix theory to analyze the Jacobian matrices of larger networks with robust statistical properties.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Experimental Biology, Genetics Area, University of Jaén, Campus Las Lagunillas s/n, 23071, Jaén, Spain.
Acanthocephalan parasites are often overlooked in many areas of research, and satellitome and cytogenetic analyzes are no exception. The species of the genus Acanthocephalus are known for their very small chromosomes with ambiguous morphology, which makes karyotyping difficult. In this study, we performed the first satellitome analysis of three Acanthocephalus species to identify species- and chromosome-specific satellites that could serve as cytogenetic markers.
View Article and Find Full Text PDFSci Rep
January 2025
Plant Biotechnology Lab, Department of Botany, Faculty of Science, Dayalbagh Educational Institute (Deemed to be University), Dayalbagh, Agra, 282005, India.
Piper longum, commonly known as long pepper, is highly valued for its bioactive alkaloid piperine, which has diverse pharmaceutical and culinary applications. In this study, we used high-throughput sequencing and de novo transcriptome assembly to analyze the transcriptomes of P. longum leaves, roots, and spikes.
View Article and Find Full Text PDFSci Rep
January 2025
Institute of Dendrology, Polish Academy of Sciences, Parkowa 5, Kórnik, 62-035, Poland.
Genetic diversity is crucial to secure the survival and sustainability of ecosystems. Given anthropogenic pressure, as well as the projected alterations connected with the level and circulation of water, riparian forests are of particular concern. In this paper, we assessed the genetic variation of black poplar - one of the keystone tree species of riverine forests.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!