I seek to comprehensively evaluate the quality of the Genetic Analysis Workshop 17 (GAW17) data set by examining the accuracy of its genotype calls, which were based on the pilot3 data of the 1000 Genomes Project. Taking advantage of the 1000 Genomes Project/HapMap sample intersect, I compared GAW17 genotype calls to HapMap III, release 2, genotype calls for an individual. These genotype calls should be concordant almost everywhere. Instead I found an astonishingly low 65.4% concordance. Regarding HapMap as the gold standard, I assume that this is a GAW17 data problem and seek to explain this discordance accordingly. I found that a large proportion of this discordance occurred outside targeted regions and that concordance could be improved to at least 94.6% by simply staying within targeted regions, which were sequenced across more samples. Furthermore, I found that in certain individuals, high sample counts did little to improve concordance and concluded that quality scores for a certain sample's sequence reads were simply incorrect.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287848 | PMC |
http://dx.doi.org/10.1186/1753-6561-5-S9-S14 | DOI Listing |
Commun Biol
January 2025
Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, E-28029, Spain.
The frequency of mitochondrial DNA haplogroups (mtDNA-HG) in humans is known to be shaped by migration and repopulation. Mounting evidence indicates that mtDNA-HG are not phenotypically neutral, and selection may contribute to its distribution. Haplogroup H, the most abundant in Europe, improved survival in sepsis.
View Article and Find Full Text PDFBMC Genomics
December 2024
Pathology and Biomedical Science Department, University of Otago Christchurch, Christchurch, New Zealand.
Background: Anorexia nervosa (AN) is a polygenic, severe metabopsychiatric disorder with poorly understood aetiology. Eight significant loci have been identified by genome-wide association studies (GWAS) and single nucleotide polymorphism (SNP)-based heritability was estimated to be ~ 11-17, yet causal variants remain elusive. It is therefore important to define the full spectrum of genetic variants in the wider regions surrounding these significantly associated loci.
View Article and Find Full Text PDFBioinformatics
December 2024
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States.
Motivation: The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it costly and complicated to produce, analyze, and store. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays.
View Article and Find Full Text PDFParasit Vectors
December 2024
Department of Biology, College of Arts and Sciences, Baylor University, Waco, TX, USA.
Background: The high burden of malaria in Africa is largely due to the presence of competent and adapted Anopheles vector species. With invasive Anopheles stephensi implicated in malaria outbreaks in Africa, understanding the genomic basis of vector-parasite compatibility is essential for assessing the risk of future outbreaks due to this mosquito. Vector compatibility with P.
View Article and Find Full Text PDFFront Immunol
December 2024
Biofrontiers Institute, University of Colorado Boulder, Boulder, CO, United States.
Background: Understanding genetic underpinnings of immune-mediated inflammatory diseases is crucial to improve treatments. Single-cell RNA sequencing (scRNA-seq) identifies cell states expanded in disease, but often overlooks genetic causality due to cost and small genotyping cohorts. Conversely, large genome-wide association studies (GWAS) are commonly accessible.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!