The majority of proteins must form higher-order assemblies to perform their biological functions. Despite the importance of protein quaternary structure, there are few machine learning models that can accurately and rapidly predict the symmetry of assemblies involving multiple copies of the same protein chain. Here, we address this gap by training several classes of protein foundation models, including ESM-MSA, ESM2, and RoseTTAFold2, to predict homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based and deep learning methods. It achieves an average PR-AUC of 0.48 and 0.44 across homo-oligomer symmetries on two different held-out test sets compared to 0.32 and 0.23 for the template-based method. Because Seq2Symm can rapidly predict homo-oligomer symmetries using a single sequence as input (~ 80,000 proteins/hour), we have applied it to 5 entire proteomes and ~ 3.5 million unlabeled protein sequences to identify patterns in protein assembly complexity across biological kingdoms and species.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11092833PMC
http://dx.doi.org/10.21203/rs.3.rs-4215086/v1DOI Listing

Publication Analysis

Top Keywords

homo-oligomer symmetry
8
rapidly predict
8
predict homo-oligomer
8
homo-oligomer symmetries
8
protein
6
rapid accurate
4
accurate prediction
4
prediction protein
4
homo-oligomer
4
protein homo-oligomer
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!