Improvement in the accuracy of multiple sequence alignment program MAFFT.

Genome Inform

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.

Published: January 2006

In 2002, we developed and released a rapid multiple sequence alignment program MAFFT that was designed to handle a huge (up to approximately 5,000 sequences) and long data (approximately 2,000 aa or approximately 5,000 nt) in a reasonable time on a standard desktop PC. As for the accuracy, however, the previous versions (v.4 and lower) of MAFFT were outperformed by ProbCons and TCoffee v.2, both of which were released in 2004, in several benchmark tests. Here we report a recent extension of MAFFT that aims to improve the accuracy with as little cost of calculation time as possible. The extended version of MAFFT (v.5) has new iterative refinement options, G-INS-i and L-INS-i (collectively denoted as [GL]-INS-i in this report). These options use a new objective function combining the weighted sum-of-pairs (WSP) score and a score similar to COFFEE derived from all pairwise alignments. We discuss the improvement in accuracy brought by this extension, mainly using two benchmark tests released very recently, BAliBASE v.3 (for protein alignments) and BRAliBASE (for RNA alignments). According to BAliBASE v.3, the overall average accuracy of L-INS-i was higher than those of other methods successively released in 2004, although the difference among the most accurate methods (ProbCons, TCoffee v.2 and new options of MAFFT) was small. The advantage in accuracy of [GL]-INS-i became greater for the alignments consisting of approximately 50-100 sequences. By utilizing this feature of MAFFT, we also examined another possible approach to improve the accuracy by incorporating homolog information collected from database. The [GL]-INS-i options are applicable to aligning up to approximately 200 sequences, although not applicable to thousands of sequences because of time and space complexities.

Download full-text PDF

Source

Publication Analysis

Top Keywords

improvement accuracy
8
multiple sequence
8
sequence alignment
8
alignment program
8
program mafft
8
probcons tcoffee
8
released 2004
8
benchmark tests
8
improve accuracy
8
mafft
7

Similar Publications

The feasibility of using machine learning to predict COVID-19 cases.

Int J Med Inform

January 2025

School of Geography and the Environment, University of Oxford, South Parks Road, Oxford OX1 3QY, United Kingdom. Electronic address:

Background: Coronavirus Disease 2019 (COVID-19), caused by the SARS-CoV-2 virus, emerged as a global health crisis in 2019, resulting in widespread morbidity and mortality. A persistent challenge during the pandemic has been the accuracy of reported epidemic data, particularly in underdeveloped regions with limited access to COVID-19 test kits and healthcare infrastructure. In the post-COVID era, this issue remains crucial.

View Article and Find Full Text PDF

Disclaimer: In an effort to expedite the publication of articles, AJHP is posting manuscripts online as soon as possible after acceptance. Accepted manuscripts have been peer-reviewed and copyedited, but are posted online before technical formatting and author proofing. These manuscripts are not the final version of record and will be replaced with the final article (formatted per AJHP style and proofed by the authors) at a later time.

View Article and Find Full Text PDF

Introduction: Accurate measurement is critical for understanding the population health impact of nicotine pouches, yet precise, standardized measures of nicotine pouch use are lacking, possibly driving disparate prevalence estimates across studies. We implemented a split sample survey experiment to assess the impact of including a product image when asking about nicotine pouches.

Methods: We randomized an online sample of US adults ages 18-45 (N=2,130) recruited through the February 2023 wave of the Rutgers Omnibus Study to view either a text-only or text-plus-image description of oral nicotine pouches before being asked about awareness of the products.

View Article and Find Full Text PDF

Background: Drug-drug interactions (DDIs) especially antagonistic ones present significant risks to patient safety, underscoring the urgent need for reliable prediction methods. Recently, substructure-based DDI prediction has garnered much attention due to the dominant influence of functional groups and substructures on drug properties. However, existing approaches face challenges regarding the insufficient interpretability of identified substructures and the isolation of chemical substructures.

View Article and Find Full Text PDF

Bibliometric analysis of global research trends in vestibular neuritis (1980-2024).

Eur Arch Otorhinolaryngol

January 2025

Faculty of Applied Sciences, Department of Accounting and Financial Management, Necmettin Erbakan University, Konya, Turkey.

Purpose: Vestibular neuritis (VN) is a common cause of vertigo with significant impact on patients' quality of life. This study aimed to analyze global research trends in VN using bibliometric methods to identify key themes, influential authors, institutions, and countries contributing to the field.

Methods: We conducted a comprehensive search of the Web of Science Core Collection database for publications related to VN from 1980 to 2024.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!