Genetic data is limited and generating new datasets is often an expensive, time-consuming process, involving countless moving parts to genotype and phenotype individuals. While sharing data is beneficial for quality control and software development, privacy and security are of utmost importance. Generating synthetic data is a practical solution to mitigate the cost, time and sensitivities that hamper developers and researchers in producing and validating novel biotechnological solutions to data intensive problems. Existing methods focus on mutation frequencies at specific loci while ignoring epistatic interactions. Alternatively, programs that do consider epistasis are limited to two-way interactions or apply genomic constraints that make synthetic data generation arduous or computationally intensive. To solve this, we developed Polygenic Epistatic Phenotype Simulator (PEPS). Our tool is a probabilistic model that can generate synthetic phenotypes with a controllable level of complexity.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3233/SHTI231077 | DOI Listing |
Nucleic Acids Res
September 2024
Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany.
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs.
View Article and Find Full Text PDFJ Hum Genet
October 2024
Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
Genome-wide association studies have enabled the identification of important genetic factors in many trait studies. However, only a fraction of the heritability can be explained by known genetic factors, even in the most common diseases. Genetic loci combinations, or epistatic contributions expressed by combinations of single nucleotide polymorphisms (SNPs), have been argued to be one of the critical factors explaining some of the missing heritability, especially in oligogenic/polygenic diseases.
View Article and Find Full Text PDFJ Alzheimers Dis
June 2024
Parabon NanoLabs, Inc., Reston, VA, USA.
Background: Polygenic risk scores (PRS) are linear combinations of genetic markers weighted by effect size that are commonly used to predict disease risk. For complex heritable diseases such as late-onset Alzheimer's disease (LOAD), PRS models fail to capture much of the heritability. Additionally, PRS models are highly dependent on the population structure of the data on which effect sizes are assessed and have poor generalizability to new data.
View Article and Find Full Text PDFPLoS One
May 2024
Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
The genetic complexity of polygenic traits represents a captivating and intricate facet of biological inheritance. Unlike Mendelian traits controlled by a single gene, polygenic traits are influenced by multiple genetic loci, each exerting a modest effect on the trait. This cumulative impact of numerous genes, interactions among them, environmental factors, and epigenetic modifications results in a multifaceted architecture of genetic contributions to complex traits.
View Article and Find Full Text PDFPLoS One
April 2024
Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!