Fast activation maximization for molecular sequence design.

BMC Bioinformatics

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.

Published: October 2021

Background: Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence.

Results: Here, we introduce Fast SeqProp, an improved activation maximization method that combines straight-through approximation with normalization across the parameters of the input sequence distribution. Fast SeqProp overcomes bottlenecks in earlier methods arising from input parameters becoming skewed during optimization. Compared to prior methods, Fast SeqProp results in up to 100-fold faster convergence while also finding improved fitness optima for many applications. We demonstrate Fast SeqProp's capabilities by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor.

Conclusions: Fast SeqProp offers a reliable and efficient method for general-purpose sequence optimization through a differentiable fitness predictor. As demonstrated on a variety of deep learning models, the method is widely applicable, and can incorporate various regularization techniques to maintain confidence in the sequence designs. As a design tool, Fast SeqProp may aid in the development of novel molecules, drug therapies and vaccines.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8527647PMC
http://dx.doi.org/10.1186/s12859-021-04437-5DOI Listing

Publication Analysis

Top Keywords

fast seqprop
20
activation maximization
12
dna protein
8
protein sequences
8
learning models
8
deep learning
8
fast
7
seqprop
5
fast activation
4
maximization molecular
4

Similar Publications

Fast activation maximization for molecular sequence design.

BMC Bioinformatics

October 2021

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.

Background: Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!