Leveraging known genomic variants to improve detection of variants, especially close-by Indels.

Bioinformatics

Department of Computer Science, The University of Memphis, Memphis, TN, USA.

Published: September 2018

Motivation: The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately.

Results: We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost.

Availability And Implementation: Implementation can be found in our public code repository https://github.com/namsyvo/IVC.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty183DOI Listing

Publication Analysis

Top Keywords

close-by indels
12
genomes project
12
variants
8
genomic variants
8
variants close-by
8
high coverage
8
indels close
8
1000 genomes
8
sensitivity detecting
8
detecting variants
8

Similar Publications

Motivation: The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately.

View Article and Find Full Text PDF

Heavy-ion irradiation is a powerful mutagen that possesses high linear energy transfer (LET). Several studies have indicated that the value of LET affects DNA lesion formation in several ways, including the efficiency and the density of double-stranded break induction along the particle path. We assumed that the mutation type can be altered by selecting an appropriate LET value.

View Article and Find Full Text PDF

Characterization of bud emergence 46 (BEM46) protein: sequence, structural, phylogenetic and subcellular localization analyses.

Biochem Biophys Res Commun

August 2013

Abteilung für Botanik mit Schwerpunkt Genetik und Molekularbiologie, Botanisches Institut und Botanischer Garten, Christian-Albrechts-Universität zu Kiel, Kiel, Germany.

The bud emergence 46 (BEM46) protein from Neurospora crassa belongs to the α/β-hydrolase superfamily. Recently, we have reported that the BEM46 protein is localized in the perinuclear ER and also forms spots close by the plasma membrane. The protein appears to be required for cell type-specific polarity formation in N.

View Article and Find Full Text PDF

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.

Nucleic Acids Res

May 2013

Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.

Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!