The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring.

Bioinformatics

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, United Kingdom.

Published: October 2024

Motivation: Polygenic scoring is an approach for estimating an individual's likelihood of a given outcome. Polygenic scores are typically calculated from genome-wide association study (GWAS) summary statistics and individual-level genotype data for the target sample. Going from genotype to interpretable polygenic scores involves many steps and there are many methods available, limiting the accessibility of polygenic scores for research and clinical application. Additional challenges exist for studies in ancestrally diverse populations. We have implemented the leading polygenic scoring methodologies within an easy-to-use pipeline called GenoPred.

Results: Here, we present the GenoPred pipeline, an easy-to-use, high-performance, reference-standardized, and reproducible workflow for polygenic scoring. It requires minimal inputs and offers various configuration options to cater to a range of use cases. GenoPred implements a comprehensive set of analyses, including genotype and GWAS quality control, target sample ancestry inference, polygenic score file generation using a range of leading methods, and target sample scoring. GenoPred standardizes the polygenic scoring process using reference genetic data, providing interpretable polygenic scores. The pipeline is applicable to GWAS and targets data from any population within the reference, facilitating studies of diverse ancestry. GenoPred is a Snakemake pipeline with associated Conda software environments, ensuring reproducibility. We apply the pipeline to UK Biobank data demonstrating the pipeline's simplicity, efficiency, and performance. The GenoPred pipeline provides a novel resource for polygenic scoring, integrating a range of complex processes within an easy-to-use framework. GenoPred widens access to the leading polygenic scoring methodology and their application to studies of diverse ancestry.

Availability And Implementation: Freely available on the web at https://github.com/opain/GenoPred.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11462442PMC
http://dx.doi.org/10.1093/bioinformatics/btae551DOI Listing

Publication Analysis

Top Keywords

polygenic scoring
28
polygenic scores
16
genopred pipeline
12
polygenic
12
target sample
12
scoring
8
interpretable polygenic
8
leading polygenic
8
studies diverse
8
genopred
7

Similar Publications

Background: The 313-variant polygenic risk score (PRS) provides a promising tool for clinical breast cancer risk prediction. However, evaluation of the PRS across different European populations which could influence risk estimation has not been performed.

Methods: We explored the distribution of PRS across European populations using genotype data from 94,072 females without breast cancer diagnosis, of European-ancestry from 21 countries participating in the Breast Cancer Association Consortium (BCAC) and 223,316 females without breast cancer diagnosis from the UK Biobank.

View Article and Find Full Text PDF

Essential hypertension (EH) with secondary insomnia is associated with increased risks of neuroinflammation, neuronal damage, and Alzheimer's disease (AD). However, its relationship with specific cerebrospinal fluid (CSF) biomarkers of neuronal damage and neuroinflammation remains unclear. This case-control study compared CSF biomarker levels across three groups: healthy controls (HC, n = 64), hypertension-controlled (HTN-C, n = 54), and hypertension-uncontrolled (HTN-U, n = 107) groups, all EH participants experiencing secondary insomnia.

View Article and Find Full Text PDF

Introduction: United States Multi-Society Task Force colonoscopy surveillance intervals are based solely on adenoma characteristics, without accounting for other risk factors. We investigated whether a risk model including demographic, environmental, and genetic risk factors could individualize surveillance intervals under an "equal management of equal risks" framework.

Methods: Using 14,069 individuals from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial who had a diagnostic colonoscopy following an abnormal flexible sigmoidoscopy, we modeled the risk of colorectal cancer, considering the diagnostic colonoscopy finding, baseline risk factors (e.

View Article and Find Full Text PDF

Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models.

Genet Med

December 2024

Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN. Electronic address:

Purpose: The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.

View Article and Find Full Text PDF

This study aims to establish a genetic risk assessment model based on a score of short tandem repeats (STRs) of polygenic inheritance. A total of 396 children and their biological parents were collected for STR genotyping. The numbers of tandem repeats of two alleles in one STR locus were assumed to be a quantitative genetic strength for disease incidence.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!