Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.

Bioinformatics

Department of Informatics and Bio-computing, Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, Ontario, Canada.

Published: May 2015

Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized.

Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences.

Availability And Implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html.

Contact: ivan.borozan@gmail.com

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410667PMC
http://dx.doi.org/10.1093/bioinformatics/btv006DOI Listing

Publication Analysis

Top Keywords

similarity measures
20
sequence similarity
16
alignment-based alignment-free
8
sequence
8
similarity
8
sequences
7
measures
5
integrating alignment-based
4
alignment-free sequence
4
measures biological
4

Similar Publications

MRD-guided zanubrutinib, venetoclax and obinutuzumab in relapsed CLL: primary endpoint analysis from the CLL2-BZAG trial.

Blood

January 2025

Department I of Internal Medicine and German CLL Study Group; Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf (CIO ABCD); University of Cologne, Faculty of Medicine and University Hos, Cologne, Germany.

The phase 2 CLL2-BZAG trial tested a measurable residual disease (MRD)-guided combination treatment of zanubrutinib, venetoclax and obinutuzumab after an optional bendamustine debulking in patients with relapsed/refractory CLL. In total, 42 patients were enrolled and two patients with ≤2 induction cycles were excluded from the analysis population per protocol. Patients had a median of one prior therapy (range 1-5), 18 patients (45%) had already received a BTK inhibitor (BTKi), seven patients (17.

View Article and Find Full Text PDF

Mechanical properties of a polylactic 3D-printed interim crown after thermocycling.

PLoS One

January 2025

Department of Advanced General Dentistry, College of Dentistry, Yonsei University, Seoul, Korea.

Polylactic acid (PLA) has garnered attention for use in interim dental restorations due to its biocompatibility, biodegradability, low cost, ease of fabrication, and moderate strength. However, its performance under intraoral conditions, particularly under heat and moisture, remains underexplored. This study evaluated the mechanical properties of PLA interim crowns compared with those of polymethylmethacrylate (PMMA) and bisphenol crowns under simulated intraoral conditions with thermocycling.

View Article and Find Full Text PDF

Background: Musculoskeletal pain (MSKP) disorders entail a significant burden for individuals and healthcare systems. The PainSMART-strategy has been developed aiming to reduce divergences between patients and healthcare practitioners in their understanding of MSKP by providing a shared basis for communication and to facilitate patients' self-management of MSKP. The objective of the PainSMART-project is to evaluate the effects of the PainSMART-strategy as an adjunct to usual physiotherapy management compared to usual physiotherapy management alone.

View Article and Find Full Text PDF

The roots of Salvia yunnanensis, an herbaceous perennial widely distributed in Southwest China, is often used as a substitute for S. miltiorrhiza, a highly valued plant in traditional Chinese medicine (Wu et al. 2014).

View Article and Find Full Text PDF

Ophthalmic Complications Associated With the Antidiabetic Drugs Semaglutide and Tirzepatide.

JAMA Ophthalmol

January 2025

John A. Moran Eye Center, Department of Ophthalmology & Visual Sciences, Department of Neurology, University of Utah Health, Salt Lake City.

Importance: Nearly 2% of the US population received a prescription for semaglutide in 2023. There has been a recent concern that this drug and other similar medications may be associated with ophthalmic complications.

Objective: To report ophthalmic complications associated with the use of semaglutide or tirzepatide.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!