Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5995439PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0198216PLOS

Publication Analysis

Top Keywords

protein function
16
function prediction
12
multi-task deep
8
deep neural
8
neural networks
8
machine learning
8
deep learning
8
mtdnn
8
tasks mtdnn
8
learning
5

Similar Publications

A conifer metabolite corrects episodic ataxia type 1 by voltage sensor-mediated ligand activation of Kv1.1.

Proc Natl Acad Sci U S A

January 2025

Bioelectricity Laboratory, Department of Physiology and Biophysics, School of Medicine, University of California, Irvine, CA 92697.

Loss-of-function sequence variants in , which encodes the voltage-gated potassium channel Kv1.1, cause Episodic Ataxia Type 1 (EA1) and epilepsy. Due to a paucity of drugs that directly rescue mutant Kv1.

View Article and Find Full Text PDF

In many plants, the asymmetric division of the zygote sets up the apical-basal body axis. In the cress , the zygote coexpresses regulators of the apical and basal embryo lineages, the transcription factors WOX2 and WRKY2/WOX8, respectively. WRKY2/WOX8 activity promotes nuclear migration, cellular polarity, and mitotic asymmetry of the zygote, which are hallmarks of axis formation in many plant species.

View Article and Find Full Text PDF

Biophysical constraints limit the specificity with which transcription factors (TFs) can target regulatory DNA. While individual nontarget binding events may be low affinity, the sheer number of such interactions could present a challenge for gene regulation by degrading its precision or possibly leading to an erroneous induction state. Chromatin can prevent nontarget binding by rendering DNA physically inaccessible to TFs, at the cost of energy-consuming remodeling orchestrated by pioneer factors (PFs).

View Article and Find Full Text PDF

Norepinephrine in vertebrates and its invertebrate analog, octopamine, regulate the activity of neural circuits. We find that, when hungry, larvae switch activity in type II octopaminergic motor neurons (MNs) to high-frequency bursts, which coincide with locomotion-driving bursts in type I glutamatergic MNs that converge on the same muscles. Optical quantal analysis across hundreds of synapses simultaneously reveals that octopamine potentiates glutamate release by tonic type Ib MNs, but not phasic type Is MNs, and occurs via the G-coupled octopamine receptor (OAMB).

View Article and Find Full Text PDF

Learning the language of antibody hypervariability.

Proc Natl Acad Sci U S A

January 2025

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!