In the search for genetic associations with complex traits, population isolates offer the advantage of reduced genetic and environmental heterogeneity. In addition, cost-efficient next-generation association approaches have been proposed in these populations where only a subsample of representative individuals is sequenced and then genotypes are imputed into the rest of the population. Gene mapping in such populations thus requires high-quality genetic imputation and preliminary phasing. To identify an effective study design, we compare by simulation a range of phasing and imputation software and strategies. We simulated 1,115,604 variants on chromosome 10 for 477 members of the large complex pedigree of Campora, a village within the established isolate of Cilento in southern Italy. We assessed the phasing performance of identical by descent based software ALPHAPHASE and SLRP, LD-based software SHAPEIT2, SHAPEIT3, and BEAGLE, and new software EAGLE that combines both methodologies. For imputation we compared IMPUTE2, IMPUTE4, MINIMAC3, BEAGLE, and new software PBWT. Genotyping errors and missing genotypes were simulated to observe their effects on the performance of each software. Highly accurate phased data were achieved by all software with SHAPEIT2, SHAPEIT3, and EAGLE2 providing the most accurate results. MINIMAC3, IMPUTE4, and IMPUTE2 all performed strongly as imputation software and our study highlights the considerable gain in imputation accuracy provided by a genome sequenced reference panel specific to the population isolate.

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22109DOI Listing

Publication Analysis

Top Keywords

phasing imputation
8
population isolate
8
software
8
imputation software
8
software shapeit2
8
shapeit2 shapeit3
8
beagle software
8
imputation
6
strategies phasing
4
population
4

Similar Publications

In statistical genetics, the sequentially Markov coalescent (SMC) is an important family of models for approximating the distribution of genetic variation data under complex evolutionary models. Methods based on SMC are widely used in genetics and evolutionary biology, with significant applications to genotype phasing and imputation, recombination rate estimation, and inferring population history. SMC allows for likelihood-based inference using hidden Markov models (HMMs), where the latent variable represents a genealogy.

View Article and Find Full Text PDF

Introducing field-programmable gate arrays in genotype phasing and imputation.

Bioinform Adv

July 2024

Institute of Clinical Molecular Biology, Kiel University, Am Botanischen Garten 11, 24108 Kiel, Germany.

Summary: We recently developed , a free software that combines genotype phasing and imputation in a single tool. By introducing algorithmic and technical improvements we accelerated the classical two-step approach using and . Here, we demonstrate how to use field-programmable gate arrays (FPGAs) to accelerate even further by a factor of up to 93% without loss of phasing and imputation quality.

View Article and Find Full Text PDF

A Tool for the Assessment of HLA-DQ Heterodimer Variation in Hematopoietic Cell Transplantation.

Transplant Cell Ther

November 2024

Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Medicine, University of Washington, Seattle, Washington.

When optimizing transplants, clinical decision-makers consider HLA-A, -B, -C, -DRB1 (8 matched alleles out of 8), and sometimes HLA-DQB1 (10 out of 10) matching between the patient and donor. HLA-DQ is a heterodimer formed by the β chain product of HLA-DQB1 and an α chain product of HLA-DQA1. In addition to molecules defined by the parentally inherited cis haplotypes, α-β trans-dimerization is possible between certain alleles, leading to unique molecules and a potential source of mismatched molecules.

View Article and Find Full Text PDF

We built a reference panel with 342 million autosomal variants using 78,195 individuals from the Genomics England (GEL) dataset, achieving a phasing switch error rate of 0.18% for European samples and imputation quality of r = 0.75 for variants with minor allele frequencies as low as 2 × 10 in white British samples.

View Article and Find Full Text PDF

With the rapid and significant cost reduction of next-generation sequencing, low-coverage whole-genome sequencing (lcWGS), followed by genotype imputation, is becoming a cost-effective alternative to single-nucleotide polymorphism (SNP)-array genotyping. The objectives of this study were 2-fold: (1) construct a haplotype reference panel for genotype imputation from lcWGS data in rainbow trout (Oncorhynchus mykiss); and (2) evaluate the concordance between imputed genotypes and SNP-array genotypes in 2 breeding populations. Medium-coverage (12×) whole-genome sequences were obtained from a total of 410 fish representing 5 breeding populations with various spawning dates.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!