Multi-indicator comparative evaluation for deep learning-based protein sequence design methods.

Bioinformatics

State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.

Published: February 2024

Motivation: Proteins found in nature represent only a fraction of the vast space of possible proteins. Protein design presents an opportunity to explore and expand this protein landscape. Within protein design, protein sequence design plays a crucial role, and numerous successful methods have been developed. Notably, deep learning-based protein sequence design methods have experienced significant advancements in recent years. However, a comprehensive and systematic comparison and evaluation of these methods have been lacking, with indicators provided by different methods often inconsistent or lacking effectiveness.

Results: To address this gap, we have designed a diverse set of indicators that cover several important aspects, including sequence recovery, diversity, root-mean-square deviation of protein structure, secondary structure, and the distribution of polar and nonpolar amino acids. In our evaluation, we have employed an improved weighted inferiority-superiority distance method to comprehensively assess the performance of eight widely used deep learning-based protein sequence design methods. Our evaluation not only provides rankings of these methods but also offers optimization suggestions by analyzing the strengths and weaknesses of each method. Furthermore, we have developed a method to select the best temperature parameter and proposed solutions for the common issue of designing sequences with consecutive repetitive amino acids, which is often encountered in protein design methods. These findings can greatly assist users in selecting suitable protein sequence design methods. Overall, our work contributes to the field of protein sequence design by providing a comprehensive evaluation system and optimization suggestions for different methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10868333PMC
http://dx.doi.org/10.1093/bioinformatics/btae037DOI Listing

Publication Analysis

Top Keywords

protein sequence
24
sequence design
24
design methods
20
deep learning-based
12
learning-based protein
12
protein design
12
protein
11
methods
10
design
9
amino acids
8

Similar Publications

Learning the language of antibody hypervariability.

Proc Natl Acad Sci U S A

January 2025

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples.

View Article and Find Full Text PDF

Posttranslational modifications (PTMs) of proteins play critical roles in regulating many cellular events. Antibodies targeting site-specific PTMs are essential tools for detecting and enriching PTMs at sites of interest. However, fundamental difficulties in molecular recognition of both PTM and surrounding peptide sequence have hindered the efficient generation of highly sequence-specific anti-PTM antibodies.

View Article and Find Full Text PDF

The homo-dodecameric ring-shaped RNA binding attenuation protein (TRAP) from binds up to twelve tryptophan ligands (Trp) and becomes activated to bind a specific sequence in the 5' leader region of the operon mRNA, thereby downregulating biosynthesis of Trp. Thermodynamic measurements of Trp binding have revealed a range of cooperative behavior for different TRAP variants, even if the averaged apparent affinities for Trp have been found to be similar. Proximity between the ligand binding sites, and the ligand-coupled disorder-to-order transition has implicated nearest-neighbor interactions in cooperativity.

View Article and Find Full Text PDF

A single-component flavin-dependent halogenase, AetF, has emerged as an attractive biocatalyst for catalyzing halogenation. However, its flavin chemistry remains unexplored and cannot be predicted due to its uniqueness in sequence and structure compared to other flavin-dependent monooxygenases. Here, we investigated the flavin reactions of AetF using transient kinetics.

View Article and Find Full Text PDF

Rationale: PCDH19-related epilepsy manifested various clinical features, including febrile epilepsy, with or without intellectual disability, and psych-behavioral disorders. However, there are few studies demonstrating abdominal pain as the first symptom.

Patient Concerns: A 3-year-old Chinese girl presented with clustered seizures of fever sensitivity accompanied by abdominal pain.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!