Motivation: The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately.
Results: We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost.
Availability And Implementation: Implementation can be found in our public code repository https://github.com/namsyvo/IVC.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/bty183 | DOI Listing |
Bioinformatics
September 2018
Department of Computer Science, The University of Memphis, Memphis, TN, USA.
Motivation: The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately.
View Article and Find Full Text PDFPlant J
December 2017
Nishina Center for Accelerator-Based Science, RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
Heavy-ion irradiation is a powerful mutagen that possesses high linear energy transfer (LET). Several studies have indicated that the value of LET affects DNA lesion formation in several ways, including the efficiency and the density of double-stranded break induction along the particle path. We assumed that the mutation type can be altered by selecting an appropriate LET value.
View Article and Find Full Text PDFBiochem Biophys Res Commun
August 2013
Abteilung für Botanik mit Schwerpunkt Genetik und Molekularbiologie, Botanisches Institut und Botanischer Garten, Christian-Albrechts-Universität zu Kiel, Kiel, Germany.
The bud emergence 46 (BEM46) protein from Neurospora crassa belongs to the α/β-hydrolase superfamily. Recently, we have reported that the BEM46 protein is localized in the perinuclear ER and also forms spots close by the plasma membrane. The protein appears to be required for cell type-specific polarity formation in N.
View Article and Find Full Text PDFNucleic Acids Res
May 2013
Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!