MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies.

Gemma I Martínez-Redondo Carlos Vargas-Chávez Klara Eleftheriadi Lisandra Benítez-Álvarez Marçal Vázquez-Valls Rosa Fernández

Genome Biol Evol

Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra), 08003 Barcelona, Spain.

Published: November 2024

Recent advances in high-throughput sequencing have exponentially increased the number of genomic data available for animals (Metazoa) in the last decades, with high-quality chromosome-level genomes being published almost daily. Nevertheless, generating a new genome is not an easy task due to the high cost of genome sequencing, the high complexity of assembly, and the lack of standardized protocols for genome annotation. The lack of consensus in the annotation and publication of genome files hinders research by making researchers lose time in reformatting the files for their purposes but can also reduce the quality of the genetic repertoire for an evolutionary study. Thus, the use of transcriptomes obtained using the same pipeline as a proxy for the genetic content of species remains a valuable resource that is easier to obtain, cheaper, and more comparable than genomes. In a previous study, we presented the Metazoan Assemblies from Transcriptomic Ensembles database (MATEdb), a repository of high-quality transcriptomic and genomic data for the two most diverse animal phyla, Arthropoda and Mollusca. Here, we present the newest version of MATEdb (MATEdb2) that overcomes some of the previous limitations of our database: (i) we include data from all animal phyla where public data are available, and (ii) we provide gene annotations extracted from the original GFF genome files using the same pipeline. In total, we provide proteomes inferred from high-quality transcriptomic or genomic data for almost 1,000 animal species, including the longest isoforms, all isoforms, and functional annotation based on sequence homology and protein language models, as well as the embedding representations of the sequences. We believe this new version of MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open, greener, and collaborative science.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534026	PMC
http://dx.doi.org/10.1093/gbe/evae235	DOI Listing

Publication Analysis

Top Keywords

genomic data

genome files

high-quality transcriptomic

transcriptomic genomic

animal phyla

version matedb

animal

data

genome

matedb2 collection

Similar Publications

Evaluation of Reference Gene Stability for Investigations of Intracellular Signalling in Human Cancer and Non-Malignant Mesenchymal Stromal Cells.

Front Biosci (Schol Ed)

December 2024

Laboratory of Intracellular Membranes Dynamics, Institute of Cytology of the Russian Academy of Sciences, 194064 Saint Petersburg, Russia.

Vera Kosheverova Alexander Schwarz Rimma Kamentseva Marianna Kharchenko Elena Kornilova

Background: Real-time reverse transcription quantitative polymerase chain reaction (RT-qPCR) is a powerful tool for analysing target gene expression in biological samples. To achieve reliable results by RT-qPCR, the most stable reference genes must be selected for proper data normalisation, particularly when comparing cells of different types. We aimed to choose the least variable candidate reference genes among eight housekeeping genes tested within a set of human cancer cell lines (HeLa, MCF-7, SK-UT-1B, A549, A431, SK-BR-3), as well as four lines of normal, non-malignant mesenchymal stromal cells (MSCs) of different origins.

View Article and Find Full Text PDF

Similar Publications

InPAS: An R/Bioconductor Package for Identifying Novel Polyadenylation Sites and Alternative Polyadenylation from Bulk RNA-seq Data.

Front Biosci (Schol Ed)

December 2024

Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA.

Jianhong Ou Haibo Liu Sungmi Park Michael R Green Lihua Julie Zhu

Background: Alternative cleavage and polyadenylation (APA) is a crucial post-transcriptional gene regulation mechanism that regulates gene expression in eukaryotes by increasing the diversity and complexity of both the transcriptome and proteome. Despite the development of more than a dozen experimental methods over the last decade to identify and quantify APA events, widespread adoption of these methods has been limited by technical, financial, and time constraints. Consequently, APA remains poorly understood in most eukaryotes.

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence-Driven Precision Medicine: Multi-Omics and Spatial Multi-Omics Approaches in Diffuse Large B-Cell Lymphoma (DLBCL).

Front Biosci (Landmark Ed)

November 2024

Department of Hematology, Taizhou Hospital of Zhejiang Province Affiliated to Wenzhou Medical University, 317000 Taizhou, Zhejiang, China.

Yanping Shao Xiuyan Lv Shuangwei Ying Qunyi Guo

In this comprehensive review, we delve into the transformative role of artificial intelligence (AI) in refining the application of multi-omics and spatial multi-omics within the realm of diffuse large B-cell lymphoma (DLBCL) research. We scrutinized the current landscape of multi-omics and spatial multi-omics technologies, accentuating their combined potential with AI to provide unparalleled insights into the molecular intricacies and spatial heterogeneity inherent to DLBCL. Despite current progress, we acknowledge the hurdles that impede the full utilization of these technologies, such as the integration and sophisticated analysis of complex datasets, the necessity for standardized protocols, the reproducibility of findings, and the interpretation of their biological significance.

View Article and Find Full Text PDF

Similar Publications

Discovery of Spirosnuolides A-D, Type I/III Hybrid Polyketide Spiro-Macrolides for a Chemotherapeutic Lead against Lung Cancer.

JACS Au

December 2024

Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul 08826, Republic of Korea.

Thanh-Hau Huynh Sung Chul Jang Yeon Hee Ban Eun-Young Lee Taeho Kim

Four new macrolides, spirosnuolides A-D (-, respectively), were discovered from the termite nest-derived sp. INHA29. Spirosnuolides A-D are 18-membered macrolides sharing an embedded [6,6]-spiroketal functionality inside the macrocycle and are conjugated with structurally uncommon side chains featuring cyclopentenone, 1,4-benzoquinone, hydroxyfuroic acid, or butenolide moieties.

View Article and Find Full Text PDF

Similar Publications

: a Python package for simplifying cBioPortal data access in cancer research.

JAMIA Open

February 2025

Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy.

Matteo Valerio Alessandro Inno Stefania Gori

Objectives: In recent years, the rise of big data and artificial intelligence has led to an increasing expansion of databases and web services in biomedical research. cBioPortal is one of the most widely used platforms for accessing cancer genomic and clinical data. The primary objective of this study was to develop a tool that simplifies programmatic interaction with cBioPortal's web service.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!