Indexing strategies for rapid searches of short words in genome sequences.

PLoS One

Ludwig Institute for Cancer Research, Bâtiment Génopode, Université de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Bátiment Génopode, Université de Lausanne, Lausanne, Switzerland. Christian.

Published: June 2007

Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1894650PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000579PLOS

Publication Analysis

Top Keywords

indexing strategies
8
genomes transcriptomes
8
strategies rapid
4
rapid searches
4
searches short
4
short genome
4
genome sequences
4
sequences searching
4
searching matches
4
matches large
4

Similar Publications

Age at Menopause and Development of Type 2 Diabetes in Korea.

JAMA Netw Open

January 2025

Department of Family Medicine, Korea University Guro Hospital, Korea University College of Medicine, Seoul, Republic of Korea.

Importance: There is limited evidence regarding the association between age at menopause and incident type 2 diabetes (T2D).

Objective: To investigate whether age at menopause and premature menopause are associated with T2D incidence in postmenopausal Korean women.

Design, Setting, And Participants: This population-based cohort study was conducted among a nationally representative sample from the Korean National Health Insurance Service database of 1 125 378 postmenopausal women without T2D who enrolled in 2009.

View Article and Find Full Text PDF

Tacrolimus (TAC) is an immunosuppressant widely utilized in organ transplantation. One of its primary adverse effects is glucose metabolism disorder, which significantly increases the risk of diabetes. Investigating the molecular mechanisms underlying TAC-induced diabetes is essential for developing effective prevention and treatment strategies for these adverse effects.

View Article and Find Full Text PDF

Impact of a lagoon with high anthropic activity on a World Heritage Site.

Environ Monit Assess

January 2025

Department of Earth Science, University of Bizerte-FSB, University of Carthage, 7120, Bizerte, Tunisia.

The Ichkeul-Bizerte Lagoon Complex (IBLC), a critical ecosystem for local biodiversity, faces a pressing threat due to climate change and severe pollution. Despite past conservation efforts, pollution persists, particularly in the Bizerte Lagoon. This study investigated the impact of water dynamics and climatic conditions on heavy metal contamination in the IBLC's sediments.

View Article and Find Full Text PDF

Background: Population aging has led to a surge in elderly care needs worldwide. Bone aging, skeletal muscle degeneration, and osteoporosis pose critical health challenges for the elderly. The process of bone and skeletal muscle aging not only impacts the functional abilities but also increases fragility fracture risk.

View Article and Find Full Text PDF

The global, regional, and national burden of colorectal cancer attributable to smoking from 1990 to 2021: a population-based study.

Eur J Cancer Prev

January 2025

Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, Jilin Province, China.

Colorectal cancer (CRC) is the third leading cause of cancer-related deaths worldwide, with smoking being a significant risk factor. Understanding the temporal and spatial patterns of the CRC burden attributable to smoking is crucial for global public health strategies. Data from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 were used to calculate the number of deaths, disability-adjusted life years (DALYs), age-standardized mortality rate (ASMR) per 100 000 population, and age-standardized disability-adjusted life year rate (ASDR).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!