Improved algorithms for approximate string matching (extended abstract).

BMC Bioinformatics

Department of Computer Science, University of Miami, Coral Gables, Miami, USA.

Published: January 2009

Background: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share.

Results: We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s - |n - m|).min(m, n, s) + m + n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm also excels in practice, especially in cases where the two strings compared differ significantly in length.

Conclusion: We have provided the design, analysis and implementation of a new algorithm for calculating the edit distance of two strings with both theoretical and practical implications. Source code of our algorithm is available online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648743PMC
http://dx.doi.org/10.1186/1471-2105-10-S1-S10DOI Listing

Publication Analysis

Top Keywords

edit distance
12
approximate string
8
string matching
8
distance strings
8
implementation algorithm
8
strings
6
algorithm
5
improved algorithms
4
algorithms approximate
4
string
4

Similar Publications

Evaluating Sequence Alignment Tools for Antimicrobial Resistance Gene Detection in Assembly Graphs.

Microorganisms

October 2024

Department of Mathematics and Computing Science, Saint Mary's University, Halifax, NS B3H 3C3, Canada.

Antimicrobial resistance (AMR) is an escalating global health threat, often driven by the horizontal gene transfer (HGT) of resistance genes. Detecting AMR genes and understanding their genomic context within bacterial populations is crucial for mitigating the spread of resistance. In this study, we evaluate the performance of three sequence alignment tools-Bandage, SPAligner, and GraphAligner-in identifying AMR gene sequences from assembly and de Bruijn graphs, which are commonly used in microbial genome assembly.

View Article and Find Full Text PDF

Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance.

View Article and Find Full Text PDF

Several studies report the benefits and accuracy of using autosegmentation for organ at risk (OAR) outlining in radiotherapy treatment planning. Typically, evaluations focus on accuracy metrics, and other parameters such as perceived utility and safety are routinely ignored. Here, we report our finding from the implementation and clinical evaluation of OSAIRIS, an open-source AI model for radiotherapy image segmentation that was carried out as part of its development into a medical device.

View Article and Find Full Text PDF

Aptamers are short single-stranded DNA or RNA molecules with high affinity and specificity for targets and are generated using the iterative systematic evolution of ligands by exponential enrichment (SELEX) process. Next-generation sequencing (NGS) revolutionized aptamer selections by allowing a more comprehensive analysis of SELEX-enriched aptamers as compared to Sanger sequencing. The current challenge with aptamer NGS datasets is identifying a diverse cohort of candidate aptamers with the highest likelihood of successful experimental validation.

View Article and Find Full Text PDF

Improving tabular data extraction in scanned laboratory reports using deep learning models.

J Biomed Inform

November 2024

Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA. Electronic address:

Objective: Medical laboratory testing is essential in healthcare, providing crucial data for diagnosis and treatment. Nevertheless, patients' lab testing results are often transferred via fax across healthcare organizations and are not immediately available for timely clinical decision making. Thus, it is important to develop new technologies to accurately extract lab testing information from scanned laboratory reports.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!