MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions.

BMC Bioinformatics

Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.

Published: November 2015

Background: The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem.

Results: We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log-loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log-loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65).

Conclusions: We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most-widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum-of-pairs and total column metrics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657235PMC
http://dx.doi.org/10.1186/s12859-015-0826-3DOI Listing

Publication Analysis

Top Keywords

multiple protein
16
sequence alignment
16
protein sequence
8
alignment
8
alignment multiple
8
protein sequences
8
multiple sequence
8
novel efficient
8
efficient algorithm
8
predicted locations
8

Similar Publications

Kinase-related gene fusion and point mutations play pivotal roles as drivers in cancer, necessitating optimized, targeted therapy against these alterations. The efficacy of molecularly targeted therapeutics varies depending on the specific alteration, with great success reported for such therapeutics in the treatment of cancer with kinase fusion proteins. However, the involvement of actionable alterations in solid tumors, especially regarding kinase fusions, remains unclear.

View Article and Find Full Text PDF

Background: Approval of proteasome inhibitors, immunomodulatory drugs, and anti-CD38 monoclonal antibodies (mAbs), such as daratumumab, has reshaped treatment patterns in patients with multiple myeloma (MM) in Japan. This retrospective study evaluated patient characteristics, treatment patterns, and trends in MM patients using Medical Data Vision, the largest electronic health records database in Japan with anonymous inpatient and outpatient health information.

Methods: Patients aged ≥18 years, with ≥2 records of an MM diagnostic and disease code and ≥1 record of MM treatment between 01 April 2008 and 30 June 2023 were included.

View Article and Find Full Text PDF

Motivation: The increasing accessibility of large-scale protein sequences through advanced sequencing technologies has necessitated the development of efficient and accurate methods for predicting protein function. Computational prediction models have emerged as a promising solution to expedite the annotation process. However, despite making significant progress in protein research, graph neural networks face challenges in capturing long-range structural correlations and identifying critical residues in protein graphs.

View Article and Find Full Text PDF

Copy-number variants (CNVs) are an important class of genetic variation that can mediate rapid adaptive evolution. Whereas CNVs can increase the relative fitness of the organism, they can also incur a cost due to the associated increased gene expression and repetitive DNA. We previously evolved populations of Saccharomyces cerevisiae over hundreds of generations in glutamine-limited (Gln-) chemostats and observed the recurrent evolution of CNVs at the GAP1 locus.

View Article and Find Full Text PDF

ANKRD11 binding to cohesin suggests a connection between KBG syndrome and Cornelia de Lange syndrome.

Proc Natl Acad Sci U S A

January 2025

Shenzhen Key Laboratory of Biomolecular Assembling and Regulation, Department of Neuroscience, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.

Ankyrin Repeat Domain-containing Protein 11 () is a causative gene for KBG syndrome, a significant risk factor for Cornelia de Lange syndrome (CdLS), and a highly confident autism spectrum disorder gene. Mutations of lead to developmental abnormalities in multiple organs/tissues including the brain, craniofacial and skeletal bones, and tooth structures with unknown mechanism(s). Here, we find that ANKRD11, via a short peptide fragment in its N-terminal region, binds to the cohesin complex with a high affinity, implicating why mutation can cause CdLS.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!