Recent advances in high-throughput sequencing have exponentially increased the number of genomic data available for animals (Metazoa) in the last decades, with high-quality chromosome-level genomes being published almost daily. Nevertheless, generating a new genome is not an easy task due to the high cost of genome sequencing, the high complexity of assembly, and the lack of standardized protocols for genome annotation. The lack of consensus in the annotation and publication of genome files hinders research by making researchers lose time in reformatting the files for their purposes but can also reduce the quality of the genetic repertoire for an evolutionary study. Thus, the use of transcriptomes obtained using the same pipeline as a proxy for the genetic content of species remains a valuable resource that is easier to obtain, cheaper, and more comparable than genomes. In a previous study, we presented the Metazoan Assemblies from Transcriptomic Ensembles database (MATEdb), a repository of high-quality transcriptomic and genomic data for the two most diverse animal phyla, Arthropoda and Mollusca. Here, we present the newest version of MATEdb (MATEdb2) that overcomes some of the previous limitations of our database: (i) we include data from all animal phyla where public data are available, and (ii) we provide gene annotations extracted from the original GFF genome files using the same pipeline. In total, we provide proteomes inferred from high-quality transcriptomic or genomic data for almost 1,000 animal species, including the longest isoforms, all isoforms, and functional annotation based on sequence homology and protein language models, as well as the embedding representations of the sequences. We believe this new version of MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open, greener, and collaborative science.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534026PMC
http://dx.doi.org/10.1093/gbe/evae235DOI Listing

Publication Analysis

Top Keywords

genomic data
12
genome files
8
high-quality transcriptomic
8
transcriptomic genomic
8
animal phyla
8
version matedb
8
animal
5
data
5
genome
5
matedb2 collection
4

Similar Publications

Background: Real-time reverse transcription quantitative polymerase chain reaction (RT-qPCR) is a powerful tool for analysing target gene expression in biological samples. To achieve reliable results by RT-qPCR, the most stable reference genes must be selected for proper data normalisation, particularly when comparing cells of different types. We aimed to choose the least variable candidate reference genes among eight housekeeping genes tested within a set of human cancer cell lines (HeLa, MCF-7, SK-UT-1B, A549, A431, SK-BR-3), as well as four lines of normal, non-malignant mesenchymal stromal cells (MSCs) of different origins.

View Article and Find Full Text PDF

Background: Alternative cleavage and polyadenylation (APA) is a crucial post-transcriptional gene regulation mechanism that regulates gene expression in eukaryotes by increasing the diversity and complexity of both the transcriptome and proteome. Despite the development of more than a dozen experimental methods over the last decade to identify and quantify APA events, widespread adoption of these methods has been limited by technical, financial, and time constraints. Consequently, APA remains poorly understood in most eukaryotes.

View Article and Find Full Text PDF

In this comprehensive review, we delve into the transformative role of artificial intelligence (AI) in refining the application of multi-omics and spatial multi-omics within the realm of diffuse large B-cell lymphoma (DLBCL) research. We scrutinized the current landscape of multi-omics and spatial multi-omics technologies, accentuating their combined potential with AI to provide unparalleled insights into the molecular intricacies and spatial heterogeneity inherent to DLBCL. Despite current progress, we acknowledge the hurdles that impede the full utilization of these technologies, such as the integration and sophisticated analysis of complex datasets, the necessity for standardized protocols, the reproducibility of findings, and the interpretation of their biological significance.

View Article and Find Full Text PDF

Four new macrolides, spirosnuolides A-D (-, respectively), were discovered from the termite nest-derived sp. INHA29. Spirosnuolides A-D are 18-membered macrolides sharing an embedded [6,6]-spiroketal functionality inside the macrocycle and are conjugated with structurally uncommon side chains featuring cyclopentenone, 1,4-benzoquinone, hydroxyfuroic acid, or butenolide moieties.

View Article and Find Full Text PDF

Objectives: In recent years, the rise of big data and artificial intelligence has led to an increasing expansion of databases and web services in biomedical research. cBioPortal is one of the most widely used platforms for accessing cancer genomic and clinical data. The primary objective of this study was to develop a tool that simplifies programmatic interaction with cBioPortal's web service.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!