MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.

Roberto R Expósito Jorge Veiga Jorge González-Domínguez Juan Touriño

Bioinformatics

Grupo de Arquitectura de Computadores, Universidade da Coruña, Campus de A Coruña, A Coruña 15071, Spain.

Published: September 2017

Summary: This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool.

Availability And Implementation: Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es .

Contact: rreye@udc.es.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btx307	DOI Listing

Publication Analysis

Top Keywords

big data

mardre

mardre efficient

efficient mapreduce-based

mapreduce-based removal

removal duplicate

duplicate dna

dna reads

reads cloud

cloud summary

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!