Protein embeddings and deep learning predict binding residues for various ligand classes.

Sci Rep

Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.

Published: December 2021

AI Article Synopsis

  • The text discusses the importance of protein-ligand binding and introduces bindEmbed21, an AI-based method that predicts binding sites for proteins using the ProtT5 language model without requiring multiple sequence alignments.
  • BindEmbed21 outperformed traditional MSA-based methods, achieving a performance of F1 = 48% and correctly identifying 73% of the top predicted binding residues, indicating its effectiveness.
  • This method is fast, simple, and has broad applicability, successfully identifying binding residues in over 42% of human proteins that were previously not linked to binding activities.

Article Abstract

One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable-neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8668950PMC
http://dx.doi.org/10.1038/s41598-021-03431-4DOI Listing

Publication Analysis

Top Keywords

binding residues
12
small molecules
12
metal ions
12
ligand classes
8
ions nucleic
8
nucleic acids
8
acids small
8
predicted residues
8
binding
7
residues
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!