Rapid and accurate multi-phenotype imputation for millions of individuals.

Nat Commun

Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs & Fisheries college, Jimei University, Xiamen, Fujian, People's Republic of China.

Published: January 2025

Deep phenotyping can enhance the power of genetic analysis, including genome-wide association studies (GWAS), but the occurrence of missing phenotypes compromises the potential of such resources. Although many phenotypic imputation methods have been developed, the accurate imputation of millions of individuals remains challenging. In the present study, we have developed a multi-phenotype imputation method based on mixed fast random forest (PIXANT) by leveraging efficient machine learning (ML)-based algorithms. We demonstrate by extensive simulations that PIXANT is reliable, robust and highly resource-efficient. We then apply PIXANT to the UKB data of 277,301 unrelated White British citizens and 425 traits, and GWAS is subsequently performed on the imputed phenotypes, 18.4% more GWAS loci are identified than before imputation (8710 vs 7355). The increased statistical power of GWAS identified some additional candidate genes affecting heart rate, such as RNF220, SCN10A, and RGS6, suggesting that the use of imputed phenotype data from a large cohort may lead to the discovery of additional candidate genes for complex traits.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-024-55496-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11700122PMC

Publication Analysis

Top Keywords

multi-phenotype imputation
8
imputation millions
8
millions individuals
8
additional candidate
8
candidate genes
8
imputation
5
rapid accurate
4
accurate multi-phenotype
4
individuals deep
4
deep phenotyping
4

Similar Publications

Rapid and accurate multi-phenotype imputation for millions of individuals.

Nat Commun

January 2025

Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs & Fisheries college, Jimei University, Xiamen, Fujian, People's Republic of China.

Deep phenotyping can enhance the power of genetic analysis, including genome-wide association studies (GWAS), but the occurrence of missing phenotypes compromises the potential of such resources. Although many phenotypic imputation methods have been developed, the accurate imputation of millions of individuals remains challenging. In the present study, we have developed a multi-phenotype imputation method based on mixed fast random forest (PIXANT) by leveraging efficient machine learning (ML)-based algorithms.

View Article and Find Full Text PDF

Glycated hemoglobin, fasting glucose, glycated albumin, and fructosamine are biomarkers that reflect different aspects of the glycemic process. Genetic studies of these glycemic biomarkers can shed light on unknown aspects of type 2 diabetes genetics and biology. While there exists several GWAS of glycated hemoglobin and fasting glucose, very few GWAS have focused on glycated albumin or fructosamine.

View Article and Find Full Text PDF

A multi-phenotype genome-wide association study of clades causing tuberculosis in a Ghanaian- and South African cohort.

Genomics

July 2021

DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.

Despite decades of research and advancements in diagnostics and treatment, tuberculosis remains a major public health concern. New computational methods are needed to interrogate the intersection of host- and bacterial genomes. Paired host genotype datum and infecting bacterial isolate information were analysed for associations using a multinomial logistic regression framework implemented in SNPTest.

View Article and Find Full Text PDF

Using GWAS to identify candidate genes associated with cattle morphology traits at a functional level is challenging. The main difficulty of identifying candidate genes and gene interactions associated with such complex traits is the long-range linkage disequilibrium (LD) phenomenon reported widely in dairy cattle. Systems biology approaches, such as combining the Association Weight Matrix (AWM) with a Partial Correlation in an Information Theory (PCIT) algorithm, can assist in overcoming this LD.

View Article and Find Full Text PDF

Genome-wide association studies have facilitated the discovery of thousands of loci for hundreds of phenotypes. However, the issue of missing heritability remains unsolved for most complex traits. Locus discovery could be enhanced with both improved power through multi-phenotype analysis (MPA) and use of a wider allele frequency range, including rare variants (RVs).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!