Predicting human protein function with multi-task deep neural networks.

Rui Fa Domenico Cozzetto Cen Wan David T Jones

PLoS One

The Francis Crick Institute, London, United Kingdom.

Published: December 2018

Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5995439	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0198216	PLOS

Publication Analysis

Top Keywords

protein function

function prediction

multi-task deep

deep neural

neural networks

machine learning

deep learning

mtdnn

tasks mtdnn

learning

Similar Publications

A conifer metabolite corrects episodic ataxia type 1 by voltage sensor-mediated ligand activation of Kv1.1.

Proc Natl Acad Sci U S A

January 2025

Bioelectricity Laboratory, Department of Physiology and Biophysics, School of Medicine, University of California, Irvine, CA 92697.

Rían W Manville Lorenzo Foglia Ryan F Yoshimura Derk J Hogenkamp Amy Nguyen

Loss-of-function sequence variants in , which encodes the voltage-gated potassium channel Kv1.1, cause Episodic Ataxia Type 1 (EA1) and epilepsy. Due to a paucity of drugs that directly rescue mutant Kv1.

View Article and Find Full Text PDF

Similar Publications

Integration of basal and apical embryo lineage regulators controls F-actin cable integrity and zygote asymmetry in .

Proc Natl Acad Sci U S A

January 2025

State Key Laboratory of Wheat Improvement, College of Life Science, Shandong Agricultural University, Tai'an 271018, China.

Feng Xiong Zhongjuan Zhang Wen Gong Marita Hermann Deniz Tiambeng Bak

In many plants, the asymmetric division of the zygote sets up the apical-basal body axis. In the cress , the zygote coexpresses regulators of the apical and basal embryo lineages, the transcription factors WOX2 and WRKY2/WOX8, respectively. WRKY2/WOX8 activity promotes nuclear migration, cellular polarity, and mitotic asymmetry of the zygote, which are hallmarks of axis formation in many plant species.

View Article and Find Full Text PDF

Similar Publications

Chromatin enables precise and scalable gene regulation with factors of limited specificity.

Proc Natl Acad Sci U S A

January 2025

Institute of Science and Technology Austria, AT-3400 Klosterneuburg, Austria.

Mindy Liu Perkins Justin Crocker Gašper Tkačik

Biophysical constraints limit the specificity with which transcription factors (TFs) can target regulatory DNA. While individual nontarget binding events may be low affinity, the sheer number of such interactions could present a challenge for gene regulation by degrading its precision or possibly leading to an erroneous induction state. Chromatin can prevent nontarget binding by rendering DNA physically inaccessible to TFs, at the cost of energy-consuming remodeling orchestrated by pioneer factors (PFs).

View Article and Find Full Text PDF

Similar Publications

Synapse-specific catecholaminergic modulation of neuronal glutamate release.

Proc Natl Acad Sci U S A

January 2025

Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720.

Dariya Bakshinska William YuChen Liu Ryan Schultz R Steven Stowers Adam Hoagland

Norepinephrine in vertebrates and its invertebrate analog, octopamine, regulate the activity of neural circuits. We find that, when hungry, larvae switch activity in type II octopaminergic motor neurons (MNs) to high-frequency bursts, which coincide with locomotion-driving bursts in type I glutamatergic MNs that converge on the same muscles. Optical quantal analysis across hundreds of synapses simultaneously reveals that octopamine potentiates glutamate release by tonic type Ib MNs, but not phasic type Is MNs, and occurs via the G-coupled octopamine receptor (OAMB).

View Article and Find Full Text PDF

Similar Publications

Learning the language of antibody hypervariability.

Proc Natl Acad Sci U S A

January 2025

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Rohit Singh Chiho Im Yu Qiu Brian Mackness Abhinav Gupta

Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!