The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring.

Oliver Pain Ammar Al-Chalabi Cathryn M Lewis

Bioinformatics

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, United Kingdom.

Published: October 2024

Motivation: Polygenic scoring is an approach for estimating an individual's likelihood of a given outcome. Polygenic scores are typically calculated from genome-wide association study (GWAS) summary statistics and individual-level genotype data for the target sample. Going from genotype to interpretable polygenic scores involves many steps and there are many methods available, limiting the accessibility of polygenic scores for research and clinical application. Additional challenges exist for studies in ancestrally diverse populations. We have implemented the leading polygenic scoring methodologies within an easy-to-use pipeline called GenoPred.

Results: Here, we present the GenoPred pipeline, an easy-to-use, high-performance, reference-standardized, and reproducible workflow for polygenic scoring. It requires minimal inputs and offers various configuration options to cater to a range of use cases. GenoPred implements a comprehensive set of analyses, including genotype and GWAS quality control, target sample ancestry inference, polygenic score file generation using a range of leading methods, and target sample scoring. GenoPred standardizes the polygenic scoring process using reference genetic data, providing interpretable polygenic scores. The pipeline is applicable to GWAS and targets data from any population within the reference, facilitating studies of diverse ancestry. GenoPred is a Snakemake pipeline with associated Conda software environments, ensuring reproducibility. We apply the pipeline to UK Biobank data demonstrating the pipeline's simplicity, efficiency, and performance. The GenoPred pipeline provides a novel resource for polygenic scoring, integrating a range of complex processes within an easy-to-use framework. GenoPred widens access to the leading polygenic scoring methodology and their application to studies of diverse ancestry.

Availability And Implementation: Freely available on the web at https://github.com/opain/GenoPred.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11462442	PMC
http://dx.doi.org/10.1093/bioinformatics/btae551	DOI Listing

Publication Analysis

Top Keywords

polygenic scoring

polygenic scores

genopred pipeline

polygenic

target sample

scoring

interpretable polygenic

leading polygenic

studies diverse

genopred

Similar Publications

Polygenic score distribution differences across European ancestry populations: implications for breast cancer risk prediction.

Breast Cancer Res

December 2024

Biostatistics Unit, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus.

Kristia Yiangou Nasim Mavaddat Joe Dennis Maria Zanti Qin Wang

Background: The 313-variant polygenic risk score (PRS) provides a promising tool for clinical breast cancer risk prediction. However, evaluation of the PRS across different European populations which could influence risk estimation has not been performed.

Methods: We explored the distribution of PRS across European populations using genotype data from 94,072 females without breast cancer diagnosis, of European-ancestry from 21 countries participating in the Breast Cancer Association Consortium (BCAC) and 223,316 females without breast cancer diagnosis from the UK Biobank.

View Article and Find Full Text PDF

Similar Publications

Elevated cerebrospinal fluid biomarkers of neuroinflammation and neuronal damage in essential hypertension with secondary insomnia: Implications for Alzheimer's disease risk.

Brain Behav Immun

December 2024

Beijing Hui-Long-Guan Hospital, Peking University, Beijing 100096, China. Electronic address:

Feng Zhang Xiaoli Han Qingshuang Mu Halliru Zailani Wen-Chun Liu

Essential hypertension (EH) with secondary insomnia is associated with increased risks of neuroinflammation, neuronal damage, and Alzheimer's disease (AD). However, its relationship with specific cerebrospinal fluid (CSF) biomarkers of neuronal damage and neuroinflammation remains unclear. This case-control study compared CSF biomarker levels across three groups: healthy controls (HC, n = 64), hypertension-controlled (HTN-C, n = 54), and hypertension-uncontrolled (HTN-U, n = 107) groups, all EH participants experiencing secondary insomnia.

View Article and Find Full Text PDF

Similar Publications

Evaluating the Use of Environmental and Polygenic Risk Scores to Inform Colorectal Cancer Risk-Based Surveillance Intervals.

Clin Transl Gastroenterol

December 2024

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA.

Rebecca Landy Hormuzd A Katki Wen-Yi Huang Difei Wang Minta Thomas

Introduction: United States Multi-Society Task Force colonoscopy surveillance intervals are based solely on adenoma characteristics, without accounting for other risk factors. We investigated whether a risk model including demographic, environmental, and genetic risk factors could individualize surveillance intervals under an "equal management of equal risks" framework.

Methods: Using 14,069 individuals from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial who had a diagnostic colonoscopy following an abnormal flexible sigmoidoscopy, we modeled the risk of colorectal cancer, considering the diagnostic colonoscopy finding, baseline risk factors (e.

View Article and Find Full Text PDF

Similar Publications

Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models.

Genet Med

December 2024

Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN. Electronic address:

Theodore J Morley Drew Willimitis Michael Ripperger Hyunjoon Lee Yu Zhou

Purpose: The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.

View Article and Find Full Text PDF

Similar Publications

Estimating the Genetic Risk of First-Degree Relatives for Chronic Diseases Using the Short Tandem Repeat Score as Model of Polygenic Inheritance.

Biochem Genet

December 2024

College of Medical Laboratory, Dalian Medical University, Dalian, 116044, People's Republic of China.

Xia Qi Anwar Ullah Weijian Yu Xiaojun Jin Hui Liu

This study aims to establish a genetic risk assessment model based on a score of short tandem repeats (STRs) of polygenic inheritance. A total of 396 children and their biological parents were collected for STR genotyping. The numbers of tandem repeats of two alleles in one STR locus were assumed to be a quantitative genetic strength for disease incidence.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!