RBLOSUM performs better than CorBLOSUM with lesser error per query.

BMC Res Notes

Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram, Kerala, India.

Published: May 2018

Objective: BLOSUM matrices serve as standard matrices for many protein sequence alignment programs. BLOSUM matrices have been constructed using BLOCKS version with 27,102 BLOCKS, whereas the latest updated version has 6,739,916 BLOCKS. We read with interest the research article by Hess et al. (BMC Bioinform 17:189, 2016) on CorBLOSUM, wherein it is argued that an inaccuracy in the BLOSUM code affects the cluster memberships of sequences. They show that replacing the integer based clustering threshold to floating point arguably improves the performances of CorBLOSUM over BLOSUM and RBLOSUM matrices. They compare BLOSUM62 against RBLOSUM69, with relative entropies of 0.2685 and 0.2662 respectively. The present work attempts to repeat the computation to verify the respective analog matrices.

Results: In our attempt to repeat the computation, we observed that the relative entropy of BLOSUM62 is 0.2360 and BLOSUM50 is 0.1198. As only matrices of similar entropies can be compared, BLOSUM62 can be compared only with RBLOSUM66 and BLOSUM50 can be compared only with RBLOSUM56. We conducted experiments with Astral data sets, and demonstrated the improved accuracy in the coverage. Our results imply that RBLOSUM performs statistically better than CorBLOSUM and BLOSUM matrices.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963171PMC
http://dx.doi.org/10.1186/s13104-018-3415-5DOI Listing

Publication Analysis

Top Keywords

blosum matrices
12
rblosum performs
8
better corblosum
8
corblosum blosum
8
repeat computation
8
matrices
6
blosum
5
performs better
4
corblosum
4
corblosum lesser
4

Similar Publications

Article Synopsis
  • - The development of a vaccine for Hepatitis C Virus (HCV) is essential despite the effectiveness of existing treatments, particularly focusing on inducing Pangenomic neutralizing Antibodies (PnAbs) against the diverse HCV Envelope 2 protein.
  • - Current algorithms for creating Consensus Sequences (CS) face challenges such as rigidity and insensitivity to evolutionary changes, prompting researchers to modify the "Majority" algorithm with BLOSUM matrices and assess it against the "Fitness" algorithm.
  • - The "Fitness" algorithm outperformed others by producing well-defined HCVE2 sequences for all HCV genotypes, considering evolutionary factors and offering improved properties for vaccine development, suggesting its applicability for other variable pathogens as well. *
View Article and Find Full Text PDF

Scoring alignments by embedding vector similarity.

Brief Bioinform

March 2024

Department of Computer Science, University of Western Ontario, London, N6A 5B7, Ontario, Canada.

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent.

View Article and Find Full Text PDF

Protein embedding based alignment.

BMC Bioinformatics

February 2024

Luddy School of Informatics, Computing and Engineering, Indiana University, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA.

Purpose: Despite the many progresses with alignment algorithms, aligning divergent protein sequences with less than 20-35% pairwise identity (so called "twilight zone") remains a difficult problem. Many alignment algorithms have been using substitution matrices since their creation in the 1970's to generate alignments, however, these matrices do not work well to score alignments within the twilight zone. We developed Protein Embedding based Alignments, or PEbA, to better align sequences with low pairwise identity.

View Article and Find Full Text PDF

Phylogenetics is the study of ancestral relationships among biological species. Such sequence analyses are often represented as phylogenetic trees. The branching pattern of each tree and its topology reflect the evolutionary relatedness between analyzed sequences.

View Article and Find Full Text PDF

Bridging the gaps in statistical models of protein alignment.

Bioinformatics

June 2022

Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia.

Summary: Sequences of proteins evolve by accumulating substitutions together with insertions and deletions (indels) of amino acids. However, it remains a common practice to disconnect substitutions and indels, and infer approximate models for each of them separately, to quantify sequence relationships. Although this approach brings with it computational convenience (which remains its primary motivation), there is a dearth of attempts to unify and model them systematically and together.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!