Mapping short reads against a reference genome is classically the first step of many next-generation sequencing data analyses, and it should be as accurate as possible. Because of the large number of reads to handle, numerous sophisticated algorithms have been developped in the last 3 years to tackle this problem. In this article, we first review the underlying algorithms used in most of the existing mapping tools, and then we compare the performance of nine of these tools on a well controled benchmark built for this purpose. We built a set of reads that exist in single or multiple copies in a reference genome and for which there is no mismatch, and a set of reads with three mismatches. We considered as reference genome both the human genome and a concatenation of all complete bacterial genomes. On each dataset, we quantified the capacity of the different tools to retrieve all the occurrences of the reads in the reference genome. Special attention was paid to reads uniquely reported and to reads with multiple hits.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375638 | PMC |
http://dx.doi.org/10.1089/cmb.2012.0022 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!