Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of references from non-European ancestries. The state-of-the-art imputation reference panel released by the Trans-Omics for Precision Medicine (TOPMed) initiative improved the imputation of admixed African-ancestry and Hispanic/Latino samples, but imputation for populations primarily residing outside of North America may still fall short in performance due to persisting underrepresentation. To illustrate this point, we imputed the genotypes of over 43,000 individuals across 123 populations around the world and identified numerous populations where imputation accuracy paled in comparison to that of European-ancestry populations. For instance, the mean imputation r-squared (Rsq) for variants with minor allele frequencies between 1% and 5% in Saudi Arabians (n = 1,061), Vietnamese (n = 1,264), Thai (n = 2,435), and Papua New Guineans (n = 776) were 0.79, 0.78, 0.76, and 0.62, respectively, compared to 0.90-0.93 for comparable European populations matched in sample size and SNP array content. Outside of Africa and Latin America, Rsq appeared to decrease as genetic distances to European-ancestry reference increased, as predicted. Using sequencing data as ground truth, we also showed that Rsq may over-estimate imputation accuracy for non-European populations more than European populations, suggesting further disparity in accuracy between populations. Using 1,496 sequenced individuals from Taiwan Biobank as a second reference panel to TOPMed, we also assessed a strategy to improve imputation for non-European populations with meta-imputation, but this design did not improve accuracy across frequency spectra. Taken together, our analyses suggest that we must ultimately strive to increase diversity and size to promote equity within genetics research.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11080279 | PMC |
http://dx.doi.org/10.1016/j.ajhg.2024.03.011 | DOI Listing |
Animals (Basel)
January 2025
Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
Goats are essential to the dairy industry in Shaanxi, China, with udder traits playing a critical role in determining milk production and economic value for breeding programs. However, the direct measurement of these traits in dairy goats is challenging and resource-intensive. This study leveraged genotyping imputation to explore the genetic parameters and architecture of udder traits and assess the efficiency of genomic prediction methods.
View Article and Find Full Text PDFJ Pediatr Surg
January 2025
McGill University Faculty of Medicine and Health Sciences, Canada; Harvey E. Beardmore Division of Pediatric Surgery, The Montreal Children's Hospital, McGill University Health Centre, Montreal, Qc, Canada.
Purpose: This study evaluates the effectiveness of machine learning (ML) algorithms for improving the preoperative diagnosis of acute appendicitis in children, focusing on the accurate prediction of the severity of disease.
Methods: An anonymized clinical and operative dataset was retrieved from the medical records of children undergoing emergency appendectomy between 2014 and 2021. We developed an ML pipeline that pre-processed the dataset and developed algorithms to predict 5 appendicitis grades (1 - non-perforated, 2 - localized perforation, 3 - abscess, 4 - generalized peritonitis, and 5 - generalized peritonitis with abscess).
Bioengineering (Basel)
January 2025
Division of Biostatistics, Data Science Institute, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USA.
Single-cell RNA sequencing (scRNA-seq) is a cutting-edge technique in molecular biology and genomics, revealing the cellular heterogeneity. However, scRNA-seq data often suffer from dropout events, meaning that certain genes exhibit very low or even zero expression levels due to technical limitations. Existing imputation methods for dropout events lack comprehensive evaluations in downstream analyses and do not demonstrate robustness across various scenarios.
View Article and Find Full Text PDFG3 (Bethesda)
January 2025
Division of Scientific Computing, Department of Information Technolokgy, Uppsala University, SE-751 05 Uppsala, Sweden.
Conducting genomic selection in plant breeding programs can substantially speed up the development of new varieties. Genomic selection provides more reliable insights when it is based on dense marker data, in which the rare variants can be particularly informative. Despite the availability of new technologies, the cost of large-scale genotyping remains a major limitation to the implementation of genomic selection.
View Article and Find Full Text PDFPLoS One
January 2025
Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.
As education increasingly relies on data-driven methodologies, accurately predicting student performance is essential for implementing timely and effective interventions. The California Student Performance Dataset offers a distinctive basis for analyzing complex elements that affect educational results, such as student demographics, academic behaviours, and emotional health. This study presents the GNN-Transformer-InceptionNet (GNN-TINet) model to overcome the constraints of prior models that fail to effectively capture intricate interactions in multi-label contexts, where students may display numerous performance categories concurrently.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!