Reliabilities of identifying positive selection by the branch-site and the site-prediction methods.

Proc Natl Acad Sci U S A

Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, USA.

Published: April 2009

Natural selection operating in protein-coding genes is often studied by examining the ratio (omega) of the rates of nonsynonymous to synonymous nucleotide substitution. The branch-site method (BSM) based on a likelihood ratio test is one of such tests to detect positive selection for a predetermined branch of a phylogenetic tree. However, because the number of nucleotide substitutions involved is often very small, we conducted a computer simulation to examine the reliability of BSM in comparison with the small-sample method (SSM) based on Fisher's exact test. The results indicate that BSM often generates false positives compared with SSM when the number of nucleotide substitutions is approximately 80 or smaller. Because the omega value is also used for predicting positively selected sites, we examined the reliabilities of the site-prediction methods, using nucleotide sequence data for the dim-light and color vision genes in vertebrates. The results showed that the site-prediction methods have a low probability of identifying functional changes of amino acids experimentally determined and often falsely identify other sites where amino acid substitutions are unlikely to be important. This low rate of predictability occurs because most of the current statistical methods are designed to identify codon sites with high omega values, which may not have anything to do with functional changes. The codon sites showing functional changes generally do not show a high omega value. To understand adaptive evolution, some form of experimental confirmation is necessary.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672471PMC
http://dx.doi.org/10.1073/pnas.0901855106DOI Listing

Publication Analysis

Top Keywords

site-prediction methods
12
functional changes
12
positive selection
8
number nucleotide
8
nucleotide substitutions
8
codon sites
8
high omega
8
reliabilities identifying
4
identifying positive
4
selection branch-site
4

Similar Publications

CryptoBench: Cryptic protein-ligand binding sites dataset and benchmark.

Bioinformatics

December 2024

Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic.

Motivation: Structure-based methods for detecting protein-ligand binding sites play a crucial role in various domains, from fundamental research to biomedical applications. However, current prediction methodologies often rely on holo (ligand-bound) protein conformations for training and evaluation, overlooking the significance of the apo (ligand-free) states. This oversight is particularly problematic in the case of cryptic binding sites (CBSs) where holo-based assessment yields unrealistic performance expectations.

View Article and Find Full Text PDF

Residue-Level Multiview Deep Learning for ATP Binding Site Prediction and Applications in Kinase Inhibitors.

J Chem Inf Model

December 2024

Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea.

Accurate identification of adenosine triphosphate (ATP) binding sites is crucial for understanding cellular functions and advancing drug discovery, particularly in targeting kinases for cancer treatment. Existing methods face significant challenges due to their reliance on time-consuming precomputed features and the heavily imbalanced nature of binding site data without further investigations on their utility in drug discovery. To address these limitations, we introduced Multiview-ATPBind and ResiBoost.

View Article and Find Full Text PDF

Mutations that affect RNA splicing significantly impact human diversity and disease. Here we present a method using transformers, a type of machine learning model, to detect splicing from raw 45,000-nucleotide sequences. We generate embeddings with residual neural networks and apply hard attention to select splice site candidates, enabling efficient training on long sequences.

View Article and Find Full Text PDF

Background: The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational biology necessitates thorough assessments of tool performance to ensure accuracy and reliability. Only a limited number of studies have been conducted to evaluate the performance of TFBS prediction tools comprehensively.

View Article and Find Full Text PDF

Water molecules play a significant role in maintaining protein structural stability and facilitating molecular interactions. Accurate prediction of water molecule positions around protein structures is essential for understanding their biological roles and has significant implications for protein engineering and drug discovery. Here, we introduce SuperWater, a novel generative AI framework that integrates a score-based diffusion model with equivariant graph neural networks to predict water molecule placements around proteins with high accuracy.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!