Comparing nominal and real quality scores on next-generation sequencing genotype calls.

BMC Proc

Zilkha Neurogenetic Institute, University of Southern California, 1501 San Pablo Street, Los Angeles, CA 90089, USA.

Published: November 2011

I seek to comprehensively evaluate the quality of the Genetic Analysis Workshop 17 (GAW17) data set by examining the accuracy of its genotype calls, which were based on the pilot3 data of the 1000 Genomes Project. Taking advantage of the 1000 Genomes Project/HapMap sample intersect, I compared GAW17 genotype calls to HapMap III, release 2, genotype calls for an individual. These genotype calls should be concordant almost everywhere. Instead I found an astonishingly low 65.4% concordance. Regarding HapMap as the gold standard, I assume that this is a GAW17 data problem and seek to explain this discordance accordingly. I found that a large proportion of this discordance occurred outside targeted regions and that concordance could be improved to at least 94.6% by simply staying within targeted regions, which were sequenced across more samples. Furthermore, I found that in certain individuals, high sample counts did little to improve concordance and concluded that quality scores for a certain sample's sequence reads were simply incorrect.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287848PMC
http://dx.doi.org/10.1186/1753-6561-5-S9-S14DOI Listing

Publication Analysis

Top Keywords

genotype calls
20
quality scores
8
gaw17 data
8
1000 genomes
8
targeted regions
8
genotype
5
calls
5
comparing nominal
4
nominal real
4
real quality
4

Similar Publications

The frequency of mitochondrial DNA haplogroups (mtDNA-HG) in humans is known to be shaped by migration and repopulation. Mounting evidence indicates that mtDNA-HG are not phenotypically neutral, and selection may contribute to its distribution. Haplogroup H, the most abundant in Europe, improved survival in sepsis.

View Article and Find Full Text PDF

Background: Anorexia nervosa (AN) is a polygenic, severe metabopsychiatric disorder with poorly understood aetiology. Eight significant loci have been identified by genome-wide association studies (GWAS) and single nucleotide polymorphism (SNP)-based heritability was estimated to be ~ 11-17, yet causal variants remain elusive. It is therefore important to define the full spectrum of genetic variants in the wider regions surrounding these significantly associated loci.

View Article and Find Full Text PDF

Motivation: The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it costly and complicated to produce, analyze, and store. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays.

View Article and Find Full Text PDF

Background: The high burden of malaria in Africa is largely due to the presence of competent and adapted Anopheles vector species. With invasive Anopheles stephensi implicated in malaria outbreaks in Africa, understanding the genomic basis of vector-parasite compatibility is essential for assessing the risk of future outbreaks due to this mosquito. Vector compatibility with P.

View Article and Find Full Text PDF

Background: Understanding genetic underpinnings of immune-mediated inflammatory diseases is crucial to improve treatments. Single-cell RNA sequencing (scRNA-seq) identifies cell states expanded in disease, but often overlooks genetic causality due to cost and small genotyping cohorts. Conversely, large genome-wide association studies (GWAS) are commonly accessible.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!