PEPS: Polygenic Epistatic Phenotype Simulation.

Stud Health Technol Inform

Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney, Australia.

Published: January 2024

Genetic data is limited and generating new datasets is often an expensive, time-consuming process, involving countless moving parts to genotype and phenotype individuals. While sharing data is beneficial for quality control and software development, privacy and security are of utmost importance. Generating synthetic data is a practical solution to mitigate the cost, time and sensitivities that hamper developers and researchers in producing and validating novel biotechnological solutions to data intensive problems. Existing methods focus on mutation frequencies at specific loci while ignoring epistatic interactions. Alternatively, programs that do consider epistasis are limited to two-way interactions or apply genomic constraints that make synthetic data generation arduous or computationally intensive. To solve this, we developed Polygenic Epistatic Phenotype Simulator (PEPS). Our tool is a probabilistic model that can generate synthetic phenotypes with a controllable level of complexity.

Download full-text PDF

Source
http://dx.doi.org/10.3233/SHTI231077DOI Listing

Publication Analysis

Top Keywords

polygenic epistatic
8
epistatic phenotype
8
synthetic data
8
data
5
peps polygenic
4
phenotype simulation
4
simulation genetic
4
genetic data
4
data limited
4
limited generating
4

Similar Publications

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs.

View Article and Find Full Text PDF

Genome-wide association studies have enabled the identification of important genetic factors in many trait studies. However, only a fraction of the heritability can be explained by known genetic factors, even in the most common diseases. Genetic loci combinations, or epistatic contributions expressed by combinations of single nucleotide polymorphisms (SNPs), have been argued to be one of the critical factors explaining some of the missing heritability, especially in oligogenic/polygenic diseases.

View Article and Find Full Text PDF

Background: Polygenic risk scores (PRS) are linear combinations of genetic markers weighted by effect size that are commonly used to predict disease risk. For complex heritable diseases such as late-onset Alzheimer's disease (LOAD), PRS models fail to capture much of the heritability. Additionally, PRS models are highly dependent on the population structure of the data on which effect sizes are assessed and have poor generalizability to new data.

View Article and Find Full Text PDF

The genetic complexity of polygenic traits represents a captivating and intricate facet of biological inheritance. Unlike Mendelian traits controlled by a single gene, polygenic traits are influenced by multiple genetic loci, each exerting a modest effect on the trait. This cumulative impact of numerous genes, interactions among them, environmental factors, and epigenetic modifications results in a multifaceted architecture of genetic contributions to complex traits.

View Article and Find Full Text PDF
Article Synopsis
  • - The text discusses the challenges of detecting complex genetic interactions (epistasis) that influence human traits, pointing out that traditional regression methods struggle with high-order interactions in large genomic datasets due to computational limitations and inadequacies in modeling biological interactions properly.
  • - It introduces the epiTree pipeline, built on a framework called Predictability, Computability, Stability (PCS), which utilizes tree-based models to identify higher-order interactions in genomic data by selecting relevant variants based on tissue-specific gene expression and employing iterative random forests.
  • - The efficacy of the epiTree pipeline is validated through two case studies from the UK Biobank, demonstrating its ability to reveal both known and novel genetic interactions in predicting traits like red hair and multiple sclerosis, thus potentially
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!