De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks.

J Chem Inf Model

Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States.

Published: December 2020

Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to model training, and (3) exploiting sequence data with and without paired structures to enable a training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7775287PMC
http://dx.doi.org/10.1021/acs.jcim.0c00593DOI Listing

Publication Analysis

Top Keywords

novel folds
12
conditional wasserstein
12
guided conditional
8
wasserstein generative
8
generative adversarial
8
adversarial networks
8
gcwgan designs
8
design
6
folds
6
novel
5

Similar Publications

Internal and external rotation of the shoulder is often challenging to quantify in the clinic. Existing technologies, such as motion capture, can be expensive or require significant time to setup, collect data, and process and analyze the data. Other methods may rely on surveys or analog tools, which are subject to interpretation.

View Article and Find Full Text PDF

Search for light long-lived particles decaying to displaced jets in proton-proton collisions at √s = 13.6 TeV.

Rep Prog Phys

January 2025

European Organization for Nuclear Research, HCP, CH-1211 GENEVE 23, Geneva, 1211 Geneva 23, SWITZERLAND.

A search for light long-lived particles decaying to displaced jets is presented, using a data sample of proton-proton collisions at a center-of-mass energy of 13.6 TeV, corresponding to an integrated luminosity of 34.7 fb$^{-1}$, collected with the CMS detector at the CERN LHC in 2022.

View Article and Find Full Text PDF

Directed Evolution of Multicyclic Peptides Using Yeast Display for Sensitive and Selective Fluorescent Analysis of CD28 on the Cell Surface.

Anal Chem

January 2025

The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, State Key Laboratory of Physical Chemistry of Solid Surfaces, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China.

CD28 is a costimulatory receptor that provides the second signal necessary for T-cell activation and is associated with diseases, including rheumatoid arthritis, asthma, and cancer. Targeting CD28 is crucial for both functional bioanalysis and therapeutic development. Molecular probes, particularly fluorescent probes, can enhance our understanding of CD28's cellular roles.

View Article and Find Full Text PDF

The current chemotherapy treatments for liver cancer have shown limited effectiveness. Therefore, there is an urgent need to develop new drugs to combat this disease more effectively. This study reports synthesis of cobalt oxide nanoparticles coated with glucose, and conjugated with Ellagic acid.

View Article and Find Full Text PDF

Synthetic rational design of live-attenuated Zika viruses based on a computational model.

Nucleic Acids Res

January 2025

SynVaccine Ltd, Ramat Hachayal, 3 Golda Meir Street, Science Park, Nes Ziona 7403648, Israel.

Many viruses of the Flaviviridae family, including the Zika virus (ZIKV), are human pathogens of significant public health concerns. Despite extensive research, there are currently no approved vaccines available for ZIKV and specifically no live-attenuated Zika vaccine. In this current study, we suggest a novel computational algorithm for generating live-attenuated vaccines via the introduction of silent mutation into regions that undergo selection for strong or weak local RNA folding or into regions that exhibit medium levels of sequence conservation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!