EAGLE: Explicit Alternative Genome Likelihood Evaluator.

BMC Med Genomics

Artificial Intelligence Research Center, AIST, 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan.

Published: April 2018

Background: Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc quality filtering methods are employed to produce more reliable lists of putative variants, but the resulting lists typically still include numerous false positives. Thus it would be desirable to be able to rigorously evaluate the degree to which each putative variant is supported by the data. Unfortunately, users who wish to do this, e.g. for the purpose of prioritizing validation experiments, have been faced with limited options.

Results: Here we present EAGLE, a method for evaluating the degree to which sequencing data supports a given candidate genome variant. EAGLE incorporates candidate variants into explicit hypotheses about the individual's genome, and then computes the probability of the observed data (the sequencing reads) under each hypothesis. In comparison with methods which rely heavily on a particular alignment of the reads to the reference genome, EAGLE readily accounts for uncertainties that may arise from multi-mapping or local misalignment and uses the entire length of each read. We compared the scores assigned by several well-known variant callers to EAGLE for the task of ranking true putative variants on both simulated data and real genome sequencing based benchmarks. For indels, EAGLE obtained marked improvement on simulated data and a whole genome sequencing benchmark, and modest but statistically significant improvement on an exome sequencing benchmark.

Conclusions: EAGLE ranked true variants higher than the scores reported by the callers and can used to improve specificity in variant calling. EAGLE is freely available at https://github.com/tony-kuo/eagle .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918433PMC
http://dx.doi.org/10.1186/s12920-018-0342-1DOI Listing

Publication Analysis

Top Keywords

eagle
8
genome
8
sequencing data
8
sequencing reads
8
reads reference
8
reference genome
8
putative variants
8
simulated data
8
genome sequencing
8
sequencing
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!