Publications by authors named "Liam Gonzalez"

The prediction of molecular phenotypes from DNA sequences remains a longstanding challenge in genomics, often driven by limited annotated data and the inability to transfer learnings between tasks. Here, we present an extensive study of foundation models pre-trained on DNA sequences, named Nucleotide Transformer, ranging from 50 million up to 2.5 billion parameters and integrating information from 3,202 human genomes and 850 genomes from diverse species.

View Article and Find Full Text PDF
Article Synopsis
  • * AgroNT is a new large language model specifically designed to predict regulatory annotations and gene expression in plants, particularly focused on crops, achieving top-tier results in these predictions.
  • * The model's analysis on cassava includes evaluating the effects of over 10 million mutations, and the compiled data is introduced as the Plants Genomic Benchmark (PGB) to enhance deep learning approaches in genomic studies, with AgroNT available for public use on HuggingFace.
View Article and Find Full Text PDF