Background: Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing.

Results: Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals.

Conclusions: We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4359451PMC
http://dx.doi.org/10.1186/s12859-015-0489-0DOI Listing

Publication Analysis

Top Keywords

variant calling
12
calling algorithms
12
individual-based analyses
12
large sample
8
sequencing studies
8
calling methods
8
genes sequenced
8
ld-aware methods
8
genotypes
8
singleton variants
8

Similar Publications

A novel variant of paired-associative stimulation (PAS) consisting of high-frequency peripheral nerve stimulation (PNS) and high-intensity transcranial magnetic stimulation (TMS) above the motor cortex, called high-PAS, can lead to improved motor function in patients with incomplete spinal cord injury. In PAS, the interstimulus interval (ISI) between the PNS and TMS pulses plays a significant role in the location of the intended effect of the induced plastic changes. While conventional PAS protocols (single TMS pulse often applied with intensity close to resting motor threshold, and single PNS pulse) usually require precisely defined ISIs, high-PAS can induce plasticity at a wide range of ISIs and also in spite of small ISI errors, which is helpful in clinical settings where precise ISI determination can be challenging.

View Article and Find Full Text PDF

Different sheep breeds show distinct phenotypic plasticity in fat deposition in the tails. The genetic background underlying fat deposition in the tail of sheep is complex, multifactorial, and may involve allele-specific expression (ASE) mechanism to modulate allelic expression. ASE is a common phenomenon in mammals and refers to allelic imbalanced expression modified by cis-regulatory genetic variants that can be observed at heterozygous loci.

View Article and Find Full Text PDF

Background: Current clinical sequencing methods cannot effectively detect DNA methylation and allele-specific variation to provide parent-of-origin information from the proband alone. Parent-of-origin effects can lead to differential disease and the inability to assign this in de novo cases limits prognostication in the majority of affected individuals with retinoblastoma, a hereditary cancer with suspected parent-of-origin effects.

Methods: To directly assign parent-of-origin in retinoblastoma patients, genomic DNA was extracted from blood samples for sequencing using a programmable, targeted single-molecule long-read DNA genomic and epigenomic approach.

View Article and Find Full Text PDF

Motivation: The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it costly and complicated to produce, analyze, and store. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays.

View Article and Find Full Text PDF

Background: The high burden of malaria in Africa is largely due to the presence of competent and adapted Anopheles vector species. With invasive Anopheles stephensi implicated in malaria outbreaks in Africa, understanding the genomic basis of vector-parasite compatibility is essential for assessing the risk of future outbreaks due to this mosquito. Vector compatibility with P.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!