Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5995439 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0198216 | PLOS |
Proc Natl Acad Sci U S A
January 2025
Bioelectricity Laboratory, Department of Physiology and Biophysics, School of Medicine, University of California, Irvine, CA 92697.
Loss-of-function sequence variants in , which encodes the voltage-gated potassium channel Kv1.1, cause Episodic Ataxia Type 1 (EA1) and epilepsy. Due to a paucity of drugs that directly rescue mutant Kv1.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2025
State Key Laboratory of Wheat Improvement, College of Life Science, Shandong Agricultural University, Tai'an 271018, China.
In many plants, the asymmetric division of the zygote sets up the apical-basal body axis. In the cress , the zygote coexpresses regulators of the apical and basal embryo lineages, the transcription factors WOX2 and WRKY2/WOX8, respectively. WRKY2/WOX8 activity promotes nuclear migration, cellular polarity, and mitotic asymmetry of the zygote, which are hallmarks of axis formation in many plant species.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2025
Institute of Science and Technology Austria, AT-3400 Klosterneuburg, Austria.
Biophysical constraints limit the specificity with which transcription factors (TFs) can target regulatory DNA. While individual nontarget binding events may be low affinity, the sheer number of such interactions could present a challenge for gene regulation by degrading its precision or possibly leading to an erroneous induction state. Chromatin can prevent nontarget binding by rendering DNA physically inaccessible to TFs, at the cost of energy-consuming remodeling orchestrated by pioneer factors (PFs).
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2025
Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720.
Norepinephrine in vertebrates and its invertebrate analog, octopamine, regulate the activity of neural circuits. We find that, when hungry, larvae switch activity in type II octopaminergic motor neurons (MNs) to high-frequency bursts, which coincide with locomotion-driving bursts in type I glutamatergic MNs that converge on the same muscles. Optical quantal analysis across hundreds of synapses simultaneously reveals that octopamine potentiates glutamate release by tonic type Ib MNs, but not phasic type Is MNs, and occurs via the G-coupled octopamine receptor (OAMB).
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2025
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!