Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data.

Front Genet

Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Lübeck, Germany.

Published: September 2021

Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493217PMC
http://dx.doi.org/10.3389/fgene.2021.724037DOI Listing

Publication Analysis

Top Keywords

phasing imputation
8
pbwt impute
8
assessment imputation
4
imputation quality
4
quality comparison
4
comparison phasing
4
imputation
4
imputation algorithms
4
algorithms real
4
real data
4

Similar Publications

In statistical genetics, the sequentially Markov coalescent (SMC) is an important family of models for approximating the distribution of genetic variation data under complex evolutionary models. Methods based on SMC are widely used in genetics and evolutionary biology, with significant applications to genotype phasing and imputation, recombination rate estimation, and inferring population history. SMC allows for likelihood-based inference using hidden Markov models (HMMs), where the latent variable represents a genealogy.

View Article and Find Full Text PDF

Introducing field-programmable gate arrays in genotype phasing and imputation.

Bioinform Adv

July 2024

Institute of Clinical Molecular Biology, Kiel University, Am Botanischen Garten 11, 24108 Kiel, Germany.

Summary: We recently developed , a free software that combines genotype phasing and imputation in a single tool. By introducing algorithmic and technical improvements we accelerated the classical two-step approach using and . Here, we demonstrate how to use field-programmable gate arrays (FPGAs) to accelerate even further by a factor of up to 93% without loss of phasing and imputation quality.

View Article and Find Full Text PDF

A Tool for the Assessment of HLA-DQ Heterodimer Variation in Hematopoietic Cell Transplantation.

Transplant Cell Ther

November 2024

Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Medicine, University of Washington, Seattle, Washington.

When optimizing transplants, clinical decision-makers consider HLA-A, -B, -C, -DRB1 (8 matched alleles out of 8), and sometimes HLA-DQB1 (10 out of 10) matching between the patient and donor. HLA-DQ is a heterodimer formed by the β chain product of HLA-DQB1 and an α chain product of HLA-DQA1. In addition to molecules defined by the parentally inherited cis haplotypes, α-β trans-dimerization is possible between certain alleles, leading to unique molecules and a potential source of mismatched molecules.

View Article and Find Full Text PDF

We built a reference panel with 342 million autosomal variants using 78,195 individuals from the Genomics England (GEL) dataset, achieving a phasing switch error rate of 0.18% for European samples and imputation quality of r = 0.75 for variants with minor allele frequencies as low as 2 × 10 in white British samples.

View Article and Find Full Text PDF

With the rapid and significant cost reduction of next-generation sequencing, low-coverage whole-genome sequencing (lcWGS), followed by genotype imputation, is becoming a cost-effective alternative to single-nucleotide polymorphism (SNP)-array genotyping. The objectives of this study were 2-fold: (1) construct a haplotype reference panel for genotype imputation from lcWGS data in rainbow trout (Oncorhynchus mykiss); and (2) evaluate the concordance between imputed genotypes and SNP-array genotypes in 2 breeding populations. Medium-coverage (12×) whole-genome sequences were obtained from a total of 410 fish representing 5 breeding populations with various spawning dates.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!