Predicting the relative solvent accessibility (RSA) of a protein is critical to understanding its 3D structure and biological function. RSA prediction, especially when homology transfer cannot provide information about a protein's structure, is a significant step toward addressing the protein structure prediction challenge. Today, deep learning is arguably the most powerful method for predicting RSA and other structural features of proteins. In particular, recent breakthroughs in deep learning-driven by the integration of natural language processing (NLP) algorithms-have significantly advanced the field of protein research. Inspired by the remarkable success of NLP techniques, this study leverages pre-trained language models (PLMs) to enhance RSA prediction. We present a deep neural network architecture based on a combination of bidirectional recurrent neural networks and convolutional layers that can analyze long-range interactions within protein sequences and predict protein RSA using ESM-2 encoding. The final predictor, PaleAle 6.0, predicts RSA in real values as well as two-state (exposure threshold of 25%) and four-state (exposure thresholds of 4%, 25%, and 50%) discrete classifications. On the 2022 test set dataset, PaleAle 6.0 achieved over 82% accuracy for two-state RSA (RSA_2C) and 59.75% accuracy for four-state RSA (RSA_4C), with a Pearson correlation coefficient (PCC) of 77.88 for real-value RSA prediction. When evaluated on the more challenging 2024 test set, PaleAle 6.0 maintained a strong performance, achieving 79.74% accuracy in the two-state prediction and 55.30% accuracy in the four-state prediction, with a PCC of 73.08 for real-value predictions, outperforming all previously benchmarked predictors.

Download full-text PDF

Source
http://dx.doi.org/10.3390/biom15010049DOI Listing

Publication Analysis

Top Keywords

rsa prediction
12
rsa
9
relative solvent
8
solvent accessibility
8
pre-trained language
8
language models
8
models plms
8
test set
8
accuracy two-state
8
accuracy four-state
8

Similar Publications

Predicting the relative solvent accessibility (RSA) of a protein is critical to understanding its 3D structure and biological function. RSA prediction, especially when homology transfer cannot provide information about a protein's structure, is a significant step toward addressing the protein structure prediction challenge. Today, deep learning is arguably the most powerful method for predicting RSA and other structural features of proteins.

View Article and Find Full Text PDF

Root system architecture (RSA) plays an important role in plant adaptation to drought stress. However, the genetic basis of RSA in sorghum has not been adequately elucidated. This study aimed to investigate the genetic bases of RSA traits through genome-wide association studies (GWAS) and determine genomic prediction (GP) accuracy in sorghum landraces at the seedling stage.

View Article and Find Full Text PDF

Appropriate root system architecture (RSA) can improve alfalfa yield, yet its genetic basis remains largely unexplored. This study evaluated six RSA traits in 171 alfalfa genotypes grown under controlled greenhouse conditions. We also analyzed five yield-related traits in normal and drought stress environments and found a significant correlation (0.

View Article and Find Full Text PDF

An integrated approach to uncertainty and global sensitivity analysis in penstock structural modeling.

Heliyon

January 2025

Department of Mechanical Engineering, Mohammadia School of Engineering, Avenue Ibn Sina B.P 765, Agdal, Rabat, 10090, Morocco.

Enhanced penstock structural models significantly advance hydropower engineering, yet their increasing complexity introduces challenges. As model interactions intensify, predictability and comprehensibility decrease, complicating the evaluation of model accuracy and alignment with operational performance metrics and safety standards. This issue is particularly pronounced in dynamic modeling, where knowledge gaps hinder straightforward validation via observational data.

View Article and Find Full Text PDF

Early language is shaped by parent-child interactions and has been examined in relation to maternal psychopathology and parenting stress. Minimal work has examined the relation between maternal emotion dysregulation and toddler vocabulary development. This longitudinal study examined associations between maternal emotion dysregulation prenatally, maternal everyday stress at 7 months postpartum, and toddler vocabulary at 18 months.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!