AI Article Synopsis

  • Genotype imputation helps in obtaining sequence genotypes for analyses like genome-wide association studies (GWAS), but assessing imputation accuracy is crucial to avoid false positives.
  • The study compared three imputation programs—Beagle 5.2, Minimac4, and IMPUTE5—and found that high-density genotypes yield better accuracy than low-density, with notable differences in how each software estimates imputation performance.
  • The findings highlight the need for customized Rsq thresholds when filtering data and indicate that INDEL variants tend to have lower accuracy than SNPs; also, the accuracy of X chromosome imputation varies significantly between different regions.

Article Abstract

Background: Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsq). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes.

Results: The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsq, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsq threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsq filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes.

Conclusions: This study provides an empirically derived approach to apply customised software-specific Rsq thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566673PMC
http://dx.doi.org/10.1186/s12711-024-00942-2DOI Listing

Publication Analysis

Top Keywords

imputation accuracy
32
accuracy imputation
20
imputation
16
accuracy
14
estimated accuracy
8
downstream analyses
8
imputation rsq
8
empirical imputation
8
imputed variants
8
genotypes
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!