A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins.

Sci Rep

Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.

Published: November 2023

Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662474PMC
http://dx.doi.org/10.1038/s41598-023-47496-9DOI Listing

Publication Analysis

Top Keywords

sequence-based evolutionary
8
evolutionary distance
8
phylogenetic analysis
8
highly divergent
8
protein sequences
8
evolutionary relationships
8
protein structure
8
protein
7
distance
4
distance method
4

Similar Publications

Applicability of AlphaFold2 in the modeling of dimeric, trimeric, and tetrameric coiled-coil domains.

Protein Sci

January 2025

Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland.

Coiled coils are a common protein structural motif involved in cellular functions ranging from mediating protein-protein interactions to facilitating processes such as signal transduction or regulation of gene expression. They are formed by two or more alpha helices that wind around a central axis to form a buried hydrophobic core. Various forms of coiled-coil bundles have been reported, each characterized by the number, orientation, and degree of winding of the constituent helices.

View Article and Find Full Text PDF

Analysis of mutations in precision oncology using the automated, accurate, and user-friendly web tool PredictONCO.

Comput Struct Biotechnol J

December 2024

Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.

Next-generation sequencing technology has created many new opportunities for clinical diagnostics, but it faces the challenge of functional annotation of identified mutations. Various algorithms have been developed to predict the impact of missense variants that influence oncogenic drivers. However, computational pipelines that handle biological data must integrate multiple software tools, which can add complexity and hinder non-specialist users from accessing the pipeline.

View Article and Find Full Text PDF

GPTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in G Protein-Coupled Receptors.

J Chem Inf Model

December 2024

State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, China.

Accurately predicting mutations in G protein-coupled receptors (GPCRs) is critical for advancing disease diagnosis and drug discovery. In response to this imperative, GPTrans has emerged as a highly accurate predictor of disease-related mutations in GPCRs. The core innovation of GPTrans resides in the design of a novel feature extraction network, that is capable of integrating features from both wildtype and mutant protein variant sites, utilizing multifeature connections within a transformer framework to ensure comprehensive feature extraction.

View Article and Find Full Text PDF

The RhiE and RhiF proteins work together as RhiEF and function as a thiamine pyrophosphate (TPP)-dependent phosphonopyruvate decarboxylase to produce phosphonoacetaldehyde in the rhizocticin biosynthesis pathway. In this study, we determined the crystal structure of the RhiEF complexed with TPP and Mg. RhiEF forms a dimer of heterodimers, and the cofactor TPP is bound at the heterotetrameric subunit interface.

View Article and Find Full Text PDF

The secondary structures (SSs) and supersecondary structures (SSSs) underlie the three-dimensional structure of proteins. Prediction of the SSs and SSSs from protein sequences enjoys high levels of use and finds numerous applications in the development of a broad range of other bioinformatics tools. Numerous sequence-based predictors of SS and SSS were developed and published in recent years.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!