Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493217 | PMC |
http://dx.doi.org/10.3389/fgene.2021.724037 | DOI Listing |
J Am Stat Assoc
October 2023
Department of Statistics, University of Michigan.
In statistical genetics, the sequentially Markov coalescent (SMC) is an important family of models for approximating the distribution of genetic variation data under complex evolutionary models. Methods based on SMC are widely used in genetics and evolutionary biology, with significant applications to genotype phasing and imputation, recombination rate estimation, and inferring population history. SMC allows for likelihood-based inference using hidden Markov models (HMMs), where the latent variable represents a genealogy.
View Article and Find Full Text PDFBioinform Adv
July 2024
Institute of Clinical Molecular Biology, Kiel University, Am Botanischen Garten 11, 24108 Kiel, Germany.
Summary: We recently developed , a free software that combines genotype phasing and imputation in a single tool. By introducing algorithmic and technical improvements we accelerated the classical two-step approach using and . Here, we demonstrate how to use field-programmable gate arrays (FPGAs) to accelerate even further by a factor of up to 93% without loss of phasing and imputation quality.
View Article and Find Full Text PDFTransplant Cell Ther
November 2024
Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Medicine, University of Washington, Seattle, Washington.
When optimizing transplants, clinical decision-makers consider HLA-A, -B, -C, -DRB1 (8 matched alleles out of 8), and sometimes HLA-DQB1 (10 out of 10) matching between the patient and donor. HLA-DQ is a heterodimer formed by the β chain product of HLA-DQB1 and an α chain product of HLA-DQA1. In addition to molecules defined by the parentally inherited cis haplotypes, α-β trans-dimerization is possible between certain alleles, leading to unique molecules and a potential source of mismatched molecules.
View Article and Find Full Text PDFNat Genet
September 2024
Department of Statistics, University of Oxford, Oxford, UK.
We built a reference panel with 342 million autosomal variants using 78,195 individuals from the Genomics England (GEL) dataset, achieving a phasing switch error rate of 0.18% for European samples and imputation quality of r = 0.75 for variants with minor allele frequencies as low as 2 × 10 in white British samples.
View Article and Find Full Text PDFG3 (Bethesda)
September 2024
United States Department of Agriculture, National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, Kearneysville, WV 25430, USA.
With the rapid and significant cost reduction of next-generation sequencing, low-coverage whole-genome sequencing (lcWGS), followed by genotype imputation, is becoming a cost-effective alternative to single-nucleotide polymorphism (SNP)-array genotyping. The objectives of this study were 2-fold: (1) construct a haplotype reference panel for genotype imputation from lcWGS data in rainbow trout (Oncorhynchus mykiss); and (2) evaluate the concordance between imputed genotypes and SNP-array genotypes in 2 breeding populations. Medium-coverage (12×) whole-genome sequences were obtained from a total of 410 fish representing 5 breeding populations with various spawning dates.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!