Protein sequence profile prediction using ProtAlbert transformer.

Comput Biol Chem

Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.

Published: August 2022

Profiles are used to model protein families and domains. They are built by multiple sequence alignments obtained by mapping a query sequence against a database to generate a profile based on the substitution scoring matrix. The profile applications are very dependent on the alignment algorithm and scoring system for amino acid substitution. However, sometimes there are no similar sequences in the database with the query sequence based on the scoring schema. In these cases, it is not possible to make a profile. This paper proposes a method named PA_SPP, based on pre-trained ProtAlbert transformer to predict the profile for a single protein sequence without alignment. The performance of transformers on natural languages is impressive. Protein sequences can be viewed as a language; we can benefit from these models. We analyze the attention heads in different layers of ProtAlbert to show that the transformer can capture five essential protein characteristics of a single sequence. This assessment shows that ProtAlbert considers some protein properties when suggesting amino acids for each position in the sequence. In other words, transformers can be considered an appropriate alternative for alignment and scoring schema to predict a profile. We evaluate PA_SPP on the Casp13 dataset, including 55 proteins. Meanwhile, one thermophilic and two mesophilic proteins are used as case studies. The results display high similarity between the predicted profiles and HSSP profiles.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiolchem.2022.107717DOI Listing

Publication Analysis

Top Keywords

protalbert transformer
12
protein sequence
8
query sequence
8
scoring schema
8
predict profile
8
protein
6
profile
6
sequence
6
sequence profile
4
profile prediction
4

Similar Publications

Determining epitope specificity of T-cell receptors with transformers.

Bioinformatics

November 2023

Leiden Computational Biology Center, Department of Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZA, The Netherlands.

Summary: T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies.

View Article and Find Full Text PDF

Protein sequence profile prediction using ProtAlbert transformer.

Comput Biol Chem

August 2022

Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.

Profiles are used to model protein families and domains. They are built by multiple sequence alignments obtained by mapping a query sequence against a database to generate a profile based on the substitution scoring matrix. The profile applications are very dependent on the alignment algorithm and scoring system for amino acid substitution.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!