Genome-wide association studies (GWAS) traditionally analyze single traits, e.g., disease diagnoses or biomarkers.
View Article and Find Full Text PDFTelomere maintenance in neuroblastoma is linked to poor outcome and caused by either telomerase reverse transcriptase (TERT) activation or through alternative lengthening of telomeres (ALT). In contrast to TERT activation, commonly caused by genomic rearrangements or MYCN amplification, ALT is less well understood. Alterations at the ATRX locus are key drivers of ALT but only present in ∼50% of ALT tumors.
View Article and Find Full Text PDFWith the development of high-throughput technologies, genomics datasets rapidly grow in size, including functional genomics data. This has allowed the training of large Deep Learning (DL) models to predict epigenetic readouts, such as protein binding or histone modifications, from genome sequences. However, large dataset sizes come at a price of data consistency, often aggregating results from a large number of studies, conducted under varying experimental conditions.
View Article and Find Full Text PDFMethods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.
View Article and Find Full Text PDFPolygenic scores (PGSs) offer the ability to predict genetic risk for complex diseases across the life course; a key benefit over short-term prediction models. To produce risk estimates relevant to clinical and public health decision-making, it is important to account for varying effects due to age and sex. Here, we develop a novel framework to estimate country-, age-, and sex-specific estimates of cumulative incidence stratified by PGS for 18 high-burden diseases.
View Article and Find Full Text PDFMotivation: Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking.
Results: We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data.
Annu Rev Biomed Data Sci
August 2023
Understanding the noncoding part of the genome, which encodes gene regulation, is necessary to identify genetic mechanisms of disease and translate findings from genome-wide association studies into actionable results for treatments and personalized care. Here we provide an overview of the computational analysis of noncoding regions, starting from gene-regulatory mechanisms and their representation in data. Deep learning methods, when applied to these data, highlight important regulatory sequence elements and predict the functional effects of genetic variants.
View Article and Find Full Text PDFHere we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene-based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.
View Article and Find Full Text PDFIn recent years, numerous applications have demonstrated the potential of deep learning for an improved understanding of biological processes. However, most deep learning tools developed so far are designed to address a specific question on a fixed dataset and/or by a fixed model architecture. Here we present Janggu, a python library facilitates deep learning for genomics applications, aiming to ease data acquisition and model evaluation.
View Article and Find Full Text PDFFilamentous fungi, such as , are very efficient in deconstructing plant biomass by the secretion of an arsenal of plant cell wall-degrading enzymes, by remodeling metabolism to accommodate production of secreted enzymes, and by enabling transport and intracellular utilization of plant biomass components. Although a number of enzymes and transcriptional regulators involved in plant biomass utilization have been identified, how filamentous fungi sense and integrate nutritional information encoded in the plant cell wall into a regulatory hierarchy for optimal utilization of complex carbon sources is not understood. Here, we performed transcriptional profiling of on 40 different carbon sources, including plant biomass, to provide data on how fungi sense simple to complex carbohydrates.
View Article and Find Full Text PDFTumor initiation is often linked to a loss of cellular identity. Transcriptional programs determining cellular identity are preserved by epigenetically-acting chromatin factors. Although such regulators are among the most frequently mutated genes in cancer, it is not well understood how an abnormal epigenetic condition contributes to tumor onset.
View Article and Find Full Text PDFEpigenomic mapping of enhancer-associated chromatin modifications facilitates the genome-wide discovery of tissue-specific enhancers in vivo. However, reliance on single chromatin marks leads to high rates of false-positive predictions. More sophisticated, integrative methods have been described, but commonly suffer from limited accessibility to the resulting predictions and reduced biological interpretability.
View Article and Find Full Text PDF