HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

Aisling O'Driscoll Vladislav Belogrudov John Carroll Kai Kropp Paul Walsh Peter Ghazal Roy D Sleator

J Biomed Inform

Department of Biological Sciences, Cork Institute of Technology, Rossa Avenue, Bishopstown, Cork, Ireland.

Published: April 2015

The rapid growth of genomic databases has made sequence alignment a major challenge in computational biology, often requiring expensive High Performance Computing (HPC) solutions.
Many proposed parallel solutions struggle with scalability and processing "Big Data," which consists of large and complex datasets that need quick handling.
HBlast, a new parallelized algorithm built on Hadoop, improves scalability for BLAST searches by using "virtual partitioning" to efficiently manage database and query data, making it suitable for cost-effective hardware in clinical diagnostics for identifying pathogenic DNA.

The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.jbi.2015.01.008	DOI Listing

Publication Analysis

Top Keywords

hblast parallelised

input query

query sequences

database segmentation

parallelised sequence

sequence similarity--a

similarity--a hadoop

hadoop mapreducable

mapreducable basic

basic local

Similar Publications

HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

J Biomed Inform

April 2015

Department of Biological Sciences, Cork Institute of Technology, Rossa Avenue, Bishopstown, Cork, Ireland.

Aisling O'Driscoll Vladislav Belogrudov John Carroll Kai Kropp Paul Walsh

Article Synopsis

The rapid growth of genomic databases has made sequence alignment a major challenge in computational biology, often requiring expensive High Performance Computing (HPC) solutions.
Many proposed parallel solutions struggle with scalability and processing "Big Data," which consists of large and complex datasets that need quick handling.
HBlast, a new parallelized algorithm built on Hadoop, improves scalability for BLAST searches by using "virtual partitioning" to efficiently manage database and query data, making it suitable for cost-effective hardware in clinical diagnostics for identifying pathogenic DNA.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!