All of Us diversity and scale improve polygenic prediction contextually with greatest improvements for under-represented populations.

Kristin Tsuo Zhuozheng Shi Tian Ge Ravi Mandla Kangcheng Hou Yi Ding Bogdan Pasaniuc Ying Wang Alicia R Martin

bioRxiv

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.

Published: August 2024

Recent studies have demonstrated that polygenic risk scores (PRS) trained on multi-ancestry data can improve prediction accuracy in groups historically underrepresented in genomic studies, but the availability of linked health and genetic data from large-scale diverse cohorts representative of a wide spectrum of human diversity remains limited. To address this need, the All of Us research program (AoU) generated whole-genome sequences of 245,388 individuals who collectively reflect the diversity of the USA. Leveraging this resource and another widely-used population-scale biobank, the UK Biobank (UKB) with a half million participants, we developed PRS trained on multi-ancestry and multi-biobank data with up to ~750,000 participants for 32 common, complex traits and diseases across a range of genetic architectures. We then compared effects of ancestry, PRS methodology, and genetic architecture on PRS accuracy across a held out subset of ancestrally diverse AoU participants. Due to the more heterogeneous study design of AoU, we found lower heritability on average compared to UKB (0.075 vs 0.165), which limited the maximal achievable PRS accuracy in AoU. Overall, we found that the increased diversity of AoU significantly improved PRS performance in some participants in AoU, especially underrepresented individuals, across multiple phenotypes. Notably, maximizing sample size by combining discovery data across AoU and UKB is not the optimal approach for predicting some phenotypes in African ancestry populations; rather, using data from only AoU for these traits resulted in the greatest accuracy. This was especially true for less polygenic traits with large ancestry-enriched effects, such as neutrophil count ( : 0.055 vs. 0.035 using AoU vs. cross-biobank meta-analysis, respectively, because of e.g. ). Lastly, we calculated individual-level PRS accuracies rather than grouping by continental ancestry, a critical step towards interpretability in precision medicine. Individualized PRS accuracy decays linearly as a function of ancestry divergence, but the slope was smaller using multi-ancestry GWAS compared to using European GWAS. Our results highlight the potential of biobanks with more balanced representations of human diversity to facilitate more accurate PRS for the individuals least represented in genomic studies.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11326295	PMC
http://dx.doi.org/10.1101/2024.08.06.606846	DOI Listing

Publication Analysis

Top Keywords

prs accuracy

prs

aou

prs trained

trained multi-ancestry

genomic studies

human diversity

data aou

diversity

data

Similar Publications

Leveraging genetic ancestry continuum information to interpolate PRS for admixed populations.

medRxiv

January 2025

Yunfeng Ruan Rohan Bhukar Aniruddh Patel Satoshi Koyama Leland Hull

The relatively low representation of admixed populations in both discovery and fine-tuning individual-level datasets limits polygenic risk score (PRS) development and equitable clinical translation for admixed populations. Under the assumption that the most informative PRS weight for a homogeneous sample varies linearly in an ancestry continuum space, we introduce a Genetic tance-assisted PRS mbination Pipeline for erse Genetic ncestrie ( ) to interpolate a harmonized PRS for diverse, especially admixed, ancestries, leveraging multiple PRS weights fine-tuned within single-ancestry samples and genetic distance. DiscoDivas treats ancestry as a continuous variable and does not require shifting between different models when calculating PRS for different ancestries.

View Article and Find Full Text PDF

Similar Publications

Polygenic risk discriminates Lewy body dementia from Alzheimer's disease.

Alzheimers Dement

January 2025

Department of Psychiatry, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, UK.

Anna McKeever Peter Swann Maura Malpetti Paul C Donaghy Alan Thomas

Introduction: Lewy body dementia (LBD) shares genetic risk factors with Alzheimer's disease (AD), including apolipoprotein E (APOE), but is distinguishable at the genome-wide level. Polygenic risk scores (PRS) may therefore improve diagnostic classification.

Methods: We assessed diagnostic classification using AD-PRS excluding APOE (AD-PRS ), APOE risk score (APOE-RS), and plasma phosphorylated tau 181 (p-tau181), in 83 participants with LBD, 27 with positron emission tomography amyloid beta (Aβ)positive mild cognitive impairment or AD (MCI+/AD), and 57 controls.

View Article and Find Full Text PDF

Similar Publications

Deep learning captures the effect of epistasis in multifactorial diseases.

Front Med (Lausanne)

January 2025

International Laboratory of Bioinformatics, AI and Digital Sciences Institute, Faculty of Computer Science, HSE University, Moscow, Russia.

Vladislav Perelygin Alexey Kamelin Nikita Syzrantsev Layal Shaheen Anna Kim

Background: Polygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer's disease, diabetes, cardiovascular disease, cancer, and others, association between individual SNPs and disease could be non-linear due to epistatic interactions.

View Article and Find Full Text PDF

Similar Publications

Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores.

medRxiv

January 2025

Chenyin Gao Justin D Tubbs Yi Han Min Guo Sijia Li

The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting the most suitable PRS model for a specific target population remains challenging, due to issues such as limited transferability, data het-erogeneity, and the scarcity of observed phenotype in real-world settings. Ensemble learning offers a promising avenue to enhance the predictive accuracy of genetic risk assessments, but most existing methods often rely on observed phenotype data or additional genome-wide association studies (GWAS) from the target population to optimize ensemble weights, limiting their utility in real-time implementation.

View Article and Find Full Text PDF

Similar Publications

Assessing pain in multiple sclerosis: Test-retest reliability of patient-reported outcome measures and accuracy of screening tools.

Mult Scler

January 2025

REVAL Rehabilitation Research Center, Faculty of Rehabilitation Sciences, Hasselt University, Hasselt, Belgium.

Cigdem Yilmazer Miguel D'haeseleer Bernardita Soler Bart Van Wijmeersch Claudio Solaro

Background: Pain is a common symptom of multiple sclerosis (MS). The reliability of outcome measures for pain and the accuracy of screening tools are essential for treatment purposes.

Objectives: This study investigated the test-retest reliability of Neuropathic Pain Scale (NPS), Neuropathic Pain Symptom Inventory (NPSI), Brief Pain Inventory-Short Form (BPI-SF), Nordic Musculoskeletal Questionnaire (NMQ), Douleur Neuropathique 4 (DN4), and painDETECT, and the accuracy of DN4 and painDETECT.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!