Background: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share.
Results: We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s - |n - m|).min(m, n, s) + m + n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm also excels in practice, especially in cases where the two strings compared differ significantly in length.
Conclusion: We have provided the design, analysis and implementation of a new algorithm for calculating the edit distance of two strings with both theoretical and practical implications. Source code of our algorithm is available online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648743 | PMC |
http://dx.doi.org/10.1186/1471-2105-10-S1-S10 | DOI Listing |
Microorganisms
October 2024
Department of Mathematics and Computing Science, Saint Mary's University, Halifax, NS B3H 3C3, Canada.
Antimicrobial resistance (AMR) is an escalating global health threat, often driven by the horizontal gene transfer (HGT) of resistance genes. Detecting AMR genes and understanding their genomic context within bacterial populations is crucial for mitigating the spread of resistance. In this study, we evaluate the performance of three sequence alignment tools-Bandage, SPAligner, and GraphAligner-in identifying AMR gene sequences from assembly and de Bruijn graphs, which are commonly used in microbial genome assembly.
View Article and Find Full Text PDFbioRxiv
October 2024
Department of Computational and Data Sciences, Indian Institute of Science, Bangalore KA 560012, India.
Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance.
View Article and Find Full Text PDFClin Oncol (R Coll Radiol)
October 2024
Oncology Department, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.
Several studies report the benefits and accuracy of using autosegmentation for organ at risk (OAR) outlining in radiotherapy treatment planning. Typically, evaluations focus on accuracy metrics, and other parameters such as perceived utility and safety are routinely ignored. Here, we report our finding from the implementation and clinical evaluation of OSAIRIS, an open-source AI model for radiotherapy image segmentation that was carried out as part of its development into a medical device.
View Article and Find Full Text PDFMol Ther Nucleic Acids
December 2024
Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
Aptamers are short single-stranded DNA or RNA molecules with high affinity and specificity for targets and are generated using the iterative systematic evolution of ligands by exponential enrichment (SELEX) process. Next-generation sequencing (NGS) revolutionized aptamer selections by allowing a more comprehensive analysis of SELEX-enriched aptamers as compared to Sanger sequencing. The current challenge with aptamer NGS datasets is identifying a diverse cohort of candidate aptamers with the highest likelihood of successful experimental validation.
View Article and Find Full Text PDFJ Biomed Inform
November 2024
Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA. Electronic address:
Objective: Medical laboratory testing is essential in healthcare, providing crucial data for diagnosis and treatment. Nevertheless, patients' lab testing results are often transferred via fax across healthcare organizations and are not immediately available for timely clinical decision making. Thus, it is important to develop new technologies to accurately extract lab testing information from scanned laboratory reports.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!