Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723615 | PMC |
http://dx.doi.org/10.1093/nar/gkac990 | DOI Listing |
iScience
January 2025
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
The regulation of gene expression relies on the coordinated action of transcription factors (TFs) at enhancers, including both activator and repressor TFs. We employed deep learning (DL) to dissect HepG2 enhancers into positive (PAR), negative (NAR), and neutral activity regions. Sharpr-MPRA and STARR-seq highlight the dichotomy impact of NARs and PARs on modulating and catalyzing the activity of enhancers, respectively.
View Article and Find Full Text PDFCell
January 2025
Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. Electronic address:
A meta-genome-wide association study across eight psychiatric disorders has highlighted the genetic architecture of pleiotropy in major psychiatric disorders. However, mechanisms underlying pleiotropic effects of the associated variants remain to be explored. We conducted a massively parallel reporter assay to decode the regulatory logic of variants with pleiotropic and disorder-specific effects.
View Article and Find Full Text PDFGenome-wide association studies (GWAS) of melanoma risk have identified 68 independent signals at 54 loci. For most loci, specific functional variants and their respective target genes remain to be established. Capture-HiC is an assay that links fine-mapped risk variants to candidate target genes by comprehensively mapping cell-type specific chromatin interactions.
View Article and Find Full Text PDFbioRxiv
December 2024
Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA.
Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression. Previously, The ENCODE consortium mapped biochemical signals across many cell types and tissues and integrated these data to develop a Registry of 0.9 million human and 300 thousand mouse candidate cis-Regulatory Elements (cCREs) annotated with potential functions.
View Article and Find Full Text PDFbioRxiv
December 2024
Biophysics Graduate Group, University of California at Berkeley, Berkeley, CA, USA.
Despite the sequencing revolution, large swaths of the genomes sequenced to date lack any information about the arrangement of transcription factor binding sites on regulatory DNA. Massively Parallel Reporter Assays (MPRAs) have the potential to dramatically accelerate our genomic annotations by making it possible to measure the gene expression levels driven by thousands of mutational variants of a regulatory region. However, the interpretation of such data often assumes that each base pair in a regulatory sequence contributes independently to gene expression.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!