An efficient similarity search based on indexing in large DNA databases.

Comput Biol Chem

School of Electronics & Computer Eng., Chonnam National University, 300 YongBong-Dong, Buk-Gu, Gwangju 500-757, Republic of Korea.

Published: April 2010

Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiolchem.2010.03.007DOI Listing

Publication Analysis

Top Keywords

index-based search
8
heuristic algorithms
8
efficient similarity
4
search
4
similarity search
4
search based
4
based indexing
4
indexing large
4
large dna
4
dna databases
4

Similar Publications

Background: Online digital materials are integral to patient education and health care outcomes in dermatology. Acanthosis nigricans (AN) is a common condition, often associated with underlying diseases such as insulin resistance. Patients frequently search the internet for information related to this cutaneous finding.

View Article and Find Full Text PDF

Objective: Carotid artery intima-media thickness (IMT) is a non-invasive ultrasound marker of early atherosclerosis. This systematic review and meta-analysis aim to report the published differences in IMT values in children living with overweight or obesity compared to controls with normal weight.

Methods: This review was conducted according to PRISMA guidelines, including only cohorts with normal controls.

View Article and Find Full Text PDF

[Mortality due to mesothelioma and asbestosis in Campania Region (Southern Italy): perspectives for reducing asbestos exposure].

Epidemiol Prev

December 2024

Dipartimento di Medicina, Epidemiologia, Igiene del lavoro e ambientale, Istituto Nazionale per l'Assicurazione contro gli Infortuni sul Lavoro, Roma.

Objectives: to provide an overview of the geographical distribution of mesothelioma and asbestosis deaths in the Campania Region (Southern Italy) occurred from 2005 to 2018 and to identify areas at higher risk.

Design: for each municipality, Standardized Mortality Ratios (SMRs) for mesothelioma and asbestosis have been estimated from the mortality data provided by the Italian National Institute of Statistics (Istat). Deaths for which mesothelioma and asbestosis were identified as the underlying causes, according to the classification system ICD-10 codes (C45 and J61, respectively), were included.

View Article and Find Full Text PDF

Background: Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.

View Article and Find Full Text PDF

The Effect of Sampling Schedule on Assessment of Dietary Measures: Evidence From Blue Monkeys (Cercopithecus mitis stuhlmanni).

Am J Primatol

January 2025

Department of Ecology, Evolution, and Environmental Biology, Columbia University, New York, New York, USA.

Accurately assessing primate diets is important in studies of behavioral ecology and evolution. While previous research has compared sampling methods (scan, focal), we examined how sampling schedule influences accuracy of dietary measures. We define sampling schedule as the combined distribution (random vs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!