Genotype imputation in a coalescent model with infinitely-many-sites mutation.

Theor Popul Biol

Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.

Published: August 2013

Empirical studies have identified population-genetic factors as important determinants of the properties of genotype-imputation accuracy in imputation-based disease association studies. Here, we develop a simple coalescent model of three sequences that we use to explore the theoretical basis for the influence of these factors on genotype-imputation accuracy, under the assumption of infinitely-many-sites mutation. Employing a demographic model in which two populations diverged at a given time in the past, we derive the approximate expectation and variance of imputation accuracy in a study sequence sampled from one of the two populations, choosing between two reference sequences, one sampled from the same population as the study sequence and the other sampled from the other population. We show that, under this model, imputation accuracy-as measured by the proportion of polymorphic sites that are imputed correctly in the study sequence-increases in expectation with the mutation rate, the proportion of the markers in a chromosomal region that are genotyped, and the time to divergence between the study and reference populations. Each of these effects derives largely from an increase in information available for determining the reference sequence that is genetically most similar to the sequence targeted for imputation. We analyze as a function of divergence time the expected gain in imputation accuracy in the target using a reference sequence from the same population as the target rather than from the other population. Together with a growing body of empirical investigations of genotype imputation in diverse human populations, our modeling framework lays a foundation for extending imputation techniques to novel populations that have not yet been extensively examined.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3587719PMC
http://dx.doi.org/10.1016/j.tpb.2012.09.006DOI Listing

Publication Analysis

Top Keywords

genotype imputation
8
coalescent model
8
infinitely-many-sites mutation
8
genotype-imputation accuracy
8
imputation accuracy
8
study sequence
8
sequence sampled
8
sampled population
8
reference sequence
8
imputation
6

Similar Publications

One of the major challenges in genomic data sharing is protecting participants' privacy in collaborative studies and when genomic data is outsourced to perform analysis tasks, e.g., genotype imputation services and federated collaborations genomic analysis.

View Article and Find Full Text PDF

Genomic selection is a widely used quantitative method of determining the genetic value of an individual from genomic information and phenotypic data. In this study, we used a large, multi-year training population of 3248 individuals from the University of Florida strawberry (Fragaria × ananassa Duchesne) breeding program. We coupled this training population with a test population of 1460 individuals derived from 20 biparental families.

View Article and Find Full Text PDF

Epidemiological and genetic factors affecting severe epizootic hemorrhagic disease in Spanish Holstein cattle during the Southern Europe outbreak of 2023.

J Dairy Sci

January 2025

Confederación de Asociaciones de Frisona Española (CONAFE), Ctra. de Andalucía km 23600 Valdemoro, 28340 Madrid, Spain.

Epizootic hemorrhagic disease (EHD) is a non-contagious viral infection that can cause important economic losses in dairy farms. This study aimed to identify epidemiological and genetic factors influencing the susceptibility and severity of EHD in Holstein dairy cattle during the 2023 outbreak in Spain. Data from 2852 animals in 7 affected farms from 5 Spanish provinces were used.

View Article and Find Full Text PDF

Genomic Selection and WssGWAS of Sheep Body Weight and Milk Yield: Imputing Low-Coverage Sequencing Data with Similar Genetic Background Panels.

J Dairy Sci

January 2025

College of Animal Science and Technology, Northwest A&F University, 22 nt, Xinong Road, Yangling, Shaanxi, China. Electronic address:

Low-coverage whole-genome sequencing (LcWGS), a cost-effective genotyping method, offers greater flexibility in variant detection than does single-nucleotide polymorphism (SNP) chips. However, to our knowledge, no studies have explored the application of LcWGS in sheep. This study aimed to evaluate the feasibility of implementing LcWGS and genotype imputation and assess their applicability in genomic studies of body weight and milk yield in sheep.

View Article and Find Full Text PDF

Genomic analysis of mobility measures on 5-month-old gilts associated with structural soundness.

J Anim Sci

January 2025

USDA-Agricultural Research Service, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.

Sow lameness results in premature culling, causing economic loss and well-being issues. A study, utilizing a pressure-sensing mat (GAIT4) and video monitoring system (NUtrack), was conducted to identify objective measurements on gilts that are predictive of future lameness. Gilts (N = 3656) were categorized to describe their lifetime soundness: SOUND, retained for breeding with no detected mobility issues; LAME_SOW, retained for breeding and detected lame as a sow; CULL_STR, not retained due to poor leg structure; LAME_GILT, not retained due to visible signs of lameness; and CULL, not retained due to reasons other than leg structure.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!