Investigation of protein family relationships with deep learning.

Irina Ponamareva Antonina Andreeva Maxwell L Bileschi Lucy Colwell Alex Bateman

Bioinform Adv

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom.

Published: September 2024

Motivation: In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison.

Results: We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families.

Availability And Implementation: github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467057	PMC
http://dx.doi.org/10.1093/bioadv/vbae132	DOI Listing

Publication Analysis

Top Keywords

protein family

family relationships

deep learning

pfam families

investigation protein

relationships deep

learning motivation

motivation article

article propose

propose method

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!