Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Nat Biotechnol

1] Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. [2] Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada. [3] Canadian Institute for Advanced Research, Programs on Genetic Networks and Neural Computation, Toronto, Ontario, Canada.

Published: August 2015

Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3300DOI Listing

Publication Analysis

Top Keywords

sequence specificities
12
specificities dna-
8
dna- rna-binding
8
rna-binding proteins
8
deep learning
8
experimental data
8
predicting sequence
4
specificities
4
proteins deep
4
learning knowing
4

Similar Publications

A conifer metabolite corrects episodic ataxia type 1 by voltage sensor-mediated ligand activation of Kv1.1.

Proc Natl Acad Sci U S A

January 2025

Bioelectricity Laboratory, Department of Physiology and Biophysics, School of Medicine, University of California, Irvine, CA 92697.

Loss-of-function sequence variants in , which encodes the voltage-gated potassium channel Kv1.1, cause Episodic Ataxia Type 1 (EA1) and epilepsy. Due to a paucity of drugs that directly rescue mutant Kv1.

View Article and Find Full Text PDF

The interaction of bacteria and harmonine in harlequin ladybird confers an interspecies competitive edge.

Proc Natl Acad Sci U S A

January 2025

Zhejiang Key Laboratory of Biology and Ecological Regulation of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.

The harlequin ladybird, , is a predatory beetle used globally to control pests such as aphids and scale insects. Originating from East Asia, this species has become highly invasive since its introduction in the late 19th century to Europe and North America, posing a threat to local biodiversity. Intraguild predation is hypothesized to drive the success of this invasive species, but the underlying mechanisms remain unknown.

View Article and Find Full Text PDF

Learning the language of antibody hypervariability.

Proc Natl Acad Sci U S A

January 2025

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples.

View Article and Find Full Text PDF

Dissecting the cellular architecture and genetic circuitry of the soybean seed.

Proc Natl Acad Sci U S A

January 2025

Department of Plant Biology, College of Biological Sciences, University of California, Davis, CA 95616.

Seeds are complex structures composed of three regions, embryo, endosperm, and seed coat, with each further divided into subregions that consist of tissues, cell layers, and cell types. Although the seed is well characterized anatomically, much less is known about the genetic circuitry that dictates its spatial complexity. To address this issue, we profiled mRNAs from anatomically distinct seed subregions at several developmental stages.

View Article and Find Full Text PDF

Deletion of metal transporter Zip14 reduces major histocompatibility complex II expression in murine small intestinal epithelial cells.

Proc Natl Acad Sci U S A

January 2025

Center for Nutritional Sciences, Food Science and Human Nutrition Department, College of Agricultural and Life Sciences, University of Florida, Gainesville, FL 32611.

Documented worldwide, impaired immunity is a cardinal signature resulting from loss of dietary zinc, an essential micronutrient. A steady supply of zinc to meet cellular requirements is regulated by an array of zinc transporters. Deletion of the transporter Zip14 (Slc39a14) in mice produced intestinal inflammation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!