Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662474 | PMC |
http://dx.doi.org/10.1038/s41598-023-47496-9 | DOI Listing |
Protein Sci
January 2025
Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland.
Coiled coils are a common protein structural motif involved in cellular functions ranging from mediating protein-protein interactions to facilitating processes such as signal transduction or regulation of gene expression. They are formed by two or more alpha helices that wind around a central axis to form a buried hydrophobic core. Various forms of coiled-coil bundles have been reported, each characterized by the number, orientation, and degree of winding of the constituent helices.
View Article and Find Full Text PDFComput Struct Biotechnol J
December 2024
Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.
Next-generation sequencing technology has created many new opportunities for clinical diagnostics, but it faces the challenge of functional annotation of identified mutations. Various algorithms have been developed to predict the impact of missense variants that influence oncogenic drivers. However, computational pipelines that handle biological data must integrate multiple software tools, which can add complexity and hinder non-specialist users from accessing the pipeline.
View Article and Find Full Text PDFJ Chem Inf Model
December 2024
State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, China.
Accurately predicting mutations in G protein-coupled receptors (GPCRs) is critical for advancing disease diagnosis and drug discovery. In response to this imperative, GPTrans has emerged as a highly accurate predictor of disease-related mutations in GPCRs. The core innovation of GPTrans resides in the design of a novel feature extraction network, that is capable of integrating features from both wildtype and mutant protein variant sites, utilizing multifeature connections within a transformer framework to ensure comprehensive feature extraction.
View Article and Find Full Text PDFBiochemistry
December 2024
Department of Life Science, Faculty of Science, Gakushuin University, 1-5-1 Mejiro, Toshima-ku, Tokyo 171-8588, Japan.
The RhiE and RhiF proteins work together as RhiEF and function as a thiamine pyrophosphate (TPP)-dependent phosphonopyruvate decarboxylase to produce phosphonoacetaldehyde in the rhizocticin biosynthesis pathway. In this study, we determined the crystal structure of the RhiEF complexed with TPP and Mg. RhiEF forms a dimer of heterodimers, and the cofactor TPP is bound at the heterotetrameric subunit interface.
View Article and Find Full Text PDFMethods Mol Biol
November 2024
Department of Computer Science, College of Engineering, Virginia Commonwealth University, Virginia, VA, USA.
The secondary structures (SSs) and supersecondary structures (SSSs) underlie the three-dimensional structure of proteins. Prediction of the SSs and SSSs from protein sequences enjoys high levels of use and finds numerous applications in the development of a broad range of other bioinformatics tools. Numerous sequence-based predictors of SS and SSS were developed and published in recent years.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!