Imputation accuracy across global human populations.

Jordan L Cahoon Xinyue Rui Echo Tang Christopher Simons Jalen Langie Minhui Chen Ying-Chu Lo Charleston W K Chiang

Am J Hum Genet

Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA; Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA. Electronic address:

Published: May 2024

Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of references from non-European ancestries. The state-of-the-art imputation reference panel released by the Trans-Omics for Precision Medicine (TOPMed) initiative improved the imputation of admixed African-ancestry and Hispanic/Latino samples, but imputation for populations primarily residing outside of North America may still fall short in performance due to persisting underrepresentation. To illustrate this point, we imputed the genotypes of over 43,000 individuals across 123 populations around the world and identified numerous populations where imputation accuracy paled in comparison to that of European-ancestry populations. For instance, the mean imputation r-squared (Rsq) for variants with minor allele frequencies between 1% and 5% in Saudi Arabians (n = 1,061), Vietnamese (n = 1,264), Thai (n = 2,435), and Papua New Guineans (n = 776) were 0.79, 0.78, 0.76, and 0.62, respectively, compared to 0.90-0.93 for comparable European populations matched in sample size and SNP array content. Outside of Africa and Latin America, Rsq appeared to decrease as genetic distances to European-ancestry reference increased, as predicted. Using sequencing data as ground truth, we also showed that Rsq may over-estimate imputation accuracy for non-European populations more than European populations, suggesting further disparity in accuracy between populations. Using 1,496 sequenced individuals from Taiwan Biobank as a second reference panel to TOPMed, we also assessed a strategy to improve imputation for non-European populations with meta-imputation, but this design did not improve accuracy across frequency spectra. Taken together, our analyses suggest that we must ultimately strive to increase diversity and size to promote equity within genetics research.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11080279	PMC
http://dx.doi.org/10.1016/j.ajhg.2024.03.011	DOI Listing

Publication Analysis

Top Keywords

imputation accuracy

populations

imputation

reference panel

european populations

non-european populations

accuracy global

global human

human populations

populations genotype

Similar Publications

Genomic Landscape and Prediction of Udder Traits in Saanen Dairy Goats.

Animals (Basel)

January 2025

Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.

Xiaoting Yao Jiaxin Li Jiaqi Fu Xingquan Wang Longgang Ma

Goats are essential to the dairy industry in Shaanxi, China, with udder traits playing a critical role in determining milk production and economic value for breeding programs. However, the direct measurement of these traits in dairy goats is challenging and resource-intensive. This study leveraged genotyping imputation to explore the genetic parameters and architecture of udder traits and assess the efficiency of genomic prediction methods.

View Article and Find Full Text PDF

Similar Publications

Machine-learning-assisted Preoperative Prediction of Pediatric Appendicitis Severity.

J Pediatr Surg

January 2025

McGill University Faculty of Medicine and Health Sciences, Canada; Harvey E. Beardmore Division of Pediatric Surgery, The Montreal Children's Hospital, McGill University Health Centre, Montreal, Qc, Canada.

Aylin Erman Julia Ferreira Waseem Abu Ashour Elena Guadagno Etienne St-Louis

Purpose: This study evaluates the effectiveness of machine learning (ML) algorithms for improving the preoperative diagnosis of acute appendicitis in children, focusing on the accurate prediction of the severity of disease.

Methods: An anonymized clinical and operative dataset was retrieved from the medical records of children undergoing emergency appendectomy between 2014 and 2021. We developed an ML pipeline that pre-processed the dataset and developed algorithms to predict 5 appendicitis grades (1 - non-perforated, 2 - localized perforation, 3 - abscess, 4 - generalized peritonitis, and 5 - generalized peritonitis with abscess).

View Article and Find Full Text PDF

Similar Publications

CCI: A Consensus Clustering-Based Imputation Method for Addressing Dropout Events in scRNA-Seq Data.

Bioengineering (Basel)

January 2025

Division of Biostatistics, Data Science Institute, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USA.

Wanlin Juan Kwang Woo Ahn Yi-Guang Chen Chien-Wei Lin

Single-cell RNA sequencing (scRNA-seq) is a cutting-edge technique in molecular biology and genomics, revealing the cellular heterogeneity. However, scRNA-seq data often suffer from dropout events, meaning that certain genes exhibit very low or even zero expression levels due to technical limitations. Existing imputation methods for dropout events lack comprehensive evaluations in downstream analyses and do not demonstrate robustness across various scenarios.

View Article and Find Full Text PDF

Similar Publications

Using feedback in pooled experiments augmented with imputation for high genotyping accuracy at reduced cost.

G3 (Bethesda)

January 2025

Division of Scientific Computing, Department of Information Technolokgy, Uppsala University, SE-751 05 Uppsala, Sweden.

Camille Clouard Carl Nettelblad

Conducting genomic selection in plant breeding programs can substantially speed up the development of new varieties. Genomic selection provides more reliable insights when it is based on dense marker data, in which the rare variants can be particularly informative. Despite the availability of new technologies, the cost of large-scale genotyping remains a major limitation to the implementation of genomic selection.

View Article and Find Full Text PDF

Similar Publications

Optimizing multi label student performance prediction with GNN-TINet: A contextual multidimensional deep learning framework.

PLoS One

January 2025

Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.

Xiaoyi Zhang Yakang Zhang Angelina Lilac Chen Manning Yu Lihao Zhang

As education increasingly relies on data-driven methodologies, accurately predicting student performance is essential for implementing timely and effective interventions. The California Student Performance Dataset offers a distinctive basis for analyzing complex elements that affect educational results, such as student demographics, academic behaviours, and emotional health. This study presents the GNN-Transformer-InceptionNet (GNN-TINet) model to overcome the constraints of prior models that fail to effectively capture intricate interactions in multi-label contexts, where students may display numerous performance categories concurrently.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!