Evaluation of sequencing strategies for whole-genome imputation with hybrid peeling.

Genet Sel Evol

The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.

Published: April 2020

Background: For assembling large whole-genome sequence datasets for routine use in research and breeding, the sequencing strategy should be adapted to the methods that will be used later for variant discovery and imputation. In this study, we used simulation to explore the impact that the sequencing strategy and level of sequencing investment have on the overall accuracy of imputation using hybrid peeling, a pedigree-based imputation method that is well suited for large livestock populations.

Methods: We simulated marker array and whole-genome sequence data for 15 populations with simulated or real pedigrees that had different structures. In these populations, we evaluated the effect on imputation accuracy of seven methods for selecting which individuals to sequence, the generation of the pedigree to which the sequenced individuals belonged, the use of variable or uniform coverage, and the trade-off between the number of sequenced individuals and their sequencing coverage. For each population, we considered four levels of investment in sequencing that were proportional to the size of the population.

Results: Imputation accuracy depended greatly on pedigree depth. The distribution of the sequenced individuals across the generations of the pedigree underlay the performance of the different methods used to select individuals to sequence and it was critical for achieving high imputation accuracy in both early and late generations. Imputation accuracy was highest with a uniform coverage across the sequenced individuals of 2× rather than variable coverage. An investment equivalent to the cost of sequencing 2% of the population at 2× provided high imputation accuracy. The gain in imputation accuracy from additional investment decreased with larger populations and higher levels of investment. However, to achieve the same imputation accuracy, a proportionally greater investment must be used in the smaller populations compared to the larger ones.

Conclusions: Suitable sequencing strategies for subsequent imputation with hybrid peeling involve sequencing ~2% of the population at a uniform coverage 2×, distributed preferably across all generations of the pedigree, except for the few earliest generations that lack genotyped ancestors. Such sequencing strategies are beneficial for generating whole-genome sequence data in populations with deep pedigrees of closely related individuals.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132986PMC
http://dx.doi.org/10.1186/s12711-020-00537-7DOI Listing

Publication Analysis

Top Keywords

imputation accuracy
28
sequenced individuals
16
sequencing strategies
12
imputation
12
imputation hybrid
12
hybrid peeling
12
whole-genome sequence
12
sequencing
8
sequencing strategy
8
accuracy
8

Similar Publications

Atherosclerotic cardiovascular disease (ASCVD) risk calculators estimate the 10-year incident risk of myocardial infarction (MI), coronary artery disease (CAD) death, or stroke; however, they lack comprehensiveness and accuracy. Carotid intima-media thickness (CIMT) is a surrogate marker that may improve risk estimation acumen. The objective of this study was to derive ASCVD risk scores from historical data and determine whether these risk scores are associated with the history of subclinical CAD and CIMT.

View Article and Find Full Text PDF

Characterizing features affecting local ancestry inference performance in admixed populations.

Am J Hum Genet

December 2024

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; The Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA. Electronic address:

In recent years, significant efforts have been made to improve methods for genomic studies of admixed populations using local ancestry inference (LAI). Accurate LAI is crucial to ensure that downstream analyses accurately reflect the genetic ancestry of research participants. Here, we test analytic strategies for LAI to provide guidelines for optimal accuracy, focusing on admixed populations reflective of Latin America's primary continental ancestries-African (AFR), Amerindigenous (AMR), and European (EUR).

View Article and Find Full Text PDF

Clinical Manifestations.

Alzheimers Dement

December 2024

Cogstate Ltd., Melbourne, VIC, Australia.

Background: Cognitive dysfunction is central to clinicopathological models of Alzheimer's disease (AD). While AD prospective studies assess similar cognitive domains, the neuropsychological tests used vary between studies, limiting potential for aggregation. We examined a machine learning (ML) data harmonisation method for neuropsychological test data to develop a harmonised PACC score for the Alzheimer's Dementia Onset and Progression in International Cohorts (ADOPIC) consortium.

View Article and Find Full Text PDF

Performing a Multicenter Retrospective Study.

Hosp Pediatr

January 2025

Department of Pediatrics, Section of Hospital Medicine, Children's Hospital Colorado, University of Colorado School of Medicine, Aurora, Colorado.

Multicenter retrospective studies can provide a pragmatic approach to evaluating uncommon pediatric conditions and are less expensive than prospective research. A well-executed retrospective multicenter study, with rigorous study design, systematic data collection, and robust statistical analysis, can produce clinically important and generalizable findings A variety of observational designs can be employed, including cross-sectional, cohort, and case-control studies. Selection bias, ascertainment bias, and confounding are common issues in retrospective research.

View Article and Find Full Text PDF

Limited whole genome sequencing (WGS) studies in Asian populations result in a lack of representative reference panels, thus hindering the discovery of ancestry-specific variants. Here, we present the South and East Asian reference Database (SEAD) panel ( https://imputationserver.westlake.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!