Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.10503DOI Listing

Publication Analysis

Top Keywords

protein sequences
12
smith-waterman algorithm
12
gold standard
8
alignment accuracy
8
alignments
6
algorithm
6
protein
5
alignment
5
analysis protein
4
protein structural
4

Similar Publications

Construction of Immune Single Domain Antibodies Library for Development of Specific Nanobodies Using Phage Display Strategy.

Recent Pat Biotechnol

January 2025

Center of Excellence in Recombinant Biopharmaceutical Proteins, Biochemistry and Molecular Biology Department, Theodor Bilharz Research Institute, Giza, Egypt.

Background: poses a considerable global public health challenge. In Egypt, approximately 60% of the inhabitants in the Northern and Eastern areas of the Nile Delta are affected by this parasite, whereas the Southern region experiences a significantly lower infection rate of 6%.

Aim: Construction of an immune phage display Nbs library based on the VHH framework for selecting -specific Nbs for seeking cost-effective, sensitive, and specific diagnostic tools for rapidly detecting mansoni.

View Article and Find Full Text PDF

Recent advancements in deep learning, particularly large language models (LLMs), made a significant impact on how researchers study microbiome and metagenomics data. Microbial protein and genomic sequences, like natural languages, form a , enabling the adoption of LLMs to extract useful insights from complex microbial ecologies. In this paper, we review applications of deep learning and language models in analyzing microbiome and metagenomics data.

View Article and Find Full Text PDF

Predicting the location of coordinated metal ion-ligand binding sites using geometry-aware graph neural networks.

Comput Struct Biotechnol J

December 2024

Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.

More than 50 % of proteins bind to metal ions. Interactions between metal ions and proteins, especially coordinated interactions, are essential for biological functions, such as maintaining protein structure and signal transport. Physiological metal-ion binding prediction is pivotal for both elucidating the biological functions of proteins and for the design of new drugs.

View Article and Find Full Text PDF

Human rhinovirus C (HRV-C) is a significant contributor to respiratory tract infections in children and is implicated in asthma exacerbations across all age groups. Despite its impact, there is currently no licensed vaccine available for HRV-C. Here, we present a novel approach to address this gap by employing immunoinformatics techniques for the design of a multi-epitope-based vaccine against HRV-C.

View Article and Find Full Text PDF

An allelic atlas of immunoglobulin heavy chain variable regions reveals antibody binding epitope preference resilient to SARS-CoV-2 mutation escape.

Front Immunol

January 2025

State Key Laboratory of Respiratory Disease, Guangdong Laboratory of Computational Biomedicine, Center for Cell Lineage Research, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China.

Background: Although immunoglobulin (Ig) alleles play a pivotal role in the antibody response to pathogens, research to understand their role in the humoral immune response is still limited.

Methods: We retrieved the germline sequences for the IGHV from the IMGT database to illustrate the amino acid polymorphism present within germline sequences of IGHV genes. We aassembled the sequences of IgM and IgD repertoire from 130 people to investigate the genetic variations in the population.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!