Background: With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows.

Methodology/principal Findings: Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy.

Conclusions/significance: Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency significantly greater than existing methods. Murasaki is available under GPL at http://murasaki.sourceforge.net.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2945767PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0012651PLOS

Publication Analysis

Top Keywords

multiple genomes
20
anchors multiple
12
mammalian genomes
12
genomes
10
murasaki
8
multiple
8
alignment tools
8
multiple mammalian
8
blastz tba
8
time
5

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

GSK R&D, Stevenage, Hertfordshire, United Kingdom.

Background: Genetic variants in GRN, the gene encoding progranulin, are causal for or are associated with the risk of multiple neurodegenerative diseases. Modulating progranulin has been considered as a therapeutic strategy for neurodegenerative diseases including Frontotemporal Dementia (FTD) and Alzheimer's Disease (AD). Here, we integrated genetics with proteomic data to determine the causal human evidence for the therapeutic benefit of modulating progranulin in AD.

View Article and Find Full Text PDF

Drug Development.

Alzheimers Dement

December 2024

Sage Bionetworks, Seattle, WA, USA.

Background: There is an urgent need for new therapeutic and diagnostic targets for Alzheimer's disease (AD). Dementia afflicts roughly 55 million individuals worldwide, and the prevalence is increasing with longer lifespans and the absence of preventive therapies. Given the demonstrated heterogeneity of Alzheimer's disease in biological and genetic components, it is critical to identify new therapeutic approaches.

View Article and Find Full Text PDF

Dementia Care Practice.

Alzheimers Dement

December 2024

Alzheimer Center Amsterdam, Neurology, Vrije Universiteit Amsterdam, Amsterdam UMC location VUmc, Amsterdam, Netherlands.

Background: Data-driven criteria for DNA testing were implemented in routine care of Alzheimer Center Amsterdam. We aimed to explore patients' perspectives and considerations regarding their decision to (not) be tested for a monogenic cause of their disease.

Methods: In this mixed method study, 150 of 519 new patients visiting Alzheimer Center Amsterdam who fulfilled the criteria were offered DNA-diagnostics: 86(57%) accepted, 64(43%) did not.

View Article and Find Full Text PDF

Objective: This study aimed to explore the active components and potential mechanism of Tanre Qing Injection (TRQI) in the treatment of Acute Respiratory Distress Syndrome (ARDS) using network pharmacology, molecular docking, and animal experiments.

Methods: The targets of active ingredients were identified using the TCMSP and Swiss Target Prediction databases. The targets associated with ARDS were obtained from the GeneCards database, Mala card database, and Open Targets Platform.

View Article and Find Full Text PDF

Global trends and risk factors in gastric cancer: a comprehensive analysis of the Global Burden of Disease Study 2021 and multi-omics data.

Int J Med Sci

January 2025

Medical Oncology Department of Gastrointestinal Cancer, Cancer Hospital of Dalian University of Technology, Liaoning Cancer Hospital & Institute, No.44 Xiaoheyan Road, Dadong District, Shenyang 110042, Liaoning Province, China.

Gastric cancer (GC) remains a significant global health challenge. This study aimed to comprehensively analyze GC epidemiology and risk factors to inform prevention and intervention strategies. We analyzed the Global Burden of Disease Study 2021 data, conducted 16 different machine learning (ML) models of NHANES data, performed Mendelian randomization (MR) studies on disease phenotypes, dietary preferences, microbiome, blood-based markers, and integrated differential gene expression and expression quantitative trait loci (eQTL) data from multiple cohorts to identify factors associated with GC risk.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!