Investigation of protein family relationships with deep learning.

Bioinform Adv

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom.

Published: September 2024

Motivation: In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison.

Results: We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families.

Availability And Implementation: github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467057PMC
http://dx.doi.org/10.1093/bioadv/vbae132DOI Listing

Publication Analysis

Top Keywords

protein family
12
family relationships
8
deep learning
8
pfam families
8
investigation protein
4
relationships deep
4
learning motivation
4
motivation article
4
article propose
4
propose method
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!