Deep learning is a promising strategy for modeling cis-regulatory elements. However, models trained on genomic sequences often fail to explain why the same transcription factor can activate or repress transcription in different contexts. To address this limitation, we developed an active learning approach to train models that distinguish between enhancers and silencers composed of binding sites for the photoreceptor transcription factor cone-rod homeobox (CRX).
View Article and Find Full Text PDFThe transcription factor (TF) cone-rod homeobox (CRX) is essential for the differentiation and maintenance of photoreceptor cell identity. Several human variants cause degenerative retinopathies, but most are variants of uncertain significance. We performed a deep mutational scan (DMS) of nearly all possible single amino acid substitutions in CRX using a cell-based transcriptional reporter assay, curating a high-confidence list of nearly 2000 variants with altered transcriptional activity.
View Article and Find Full Text PDFCone-Rod Homeobox, encoded by , is a transcription factor (TF) essential for the terminal differentiation and maintenance of mammalian photoreceptors. Structurally, CRX comprises an ordered DNA-binding homeodomain and an intrinsically disordered transcriptional effector domain. Although a handful of human variants in have been shown to cause several different degenerative retinopathies with varying cone and rod predominance, as with most human disease genes the vast majority of observed genetic variants are uncharacterized variants of uncertain significance (VUS).
View Article and Find Full Text PDFDozens of variants in the gene for the homeodomain transcription factor (TF) cone-rod homeobox () are linked with human blinding diseases that vary in their severity and age of onset. How different variants in this single TF alter its function in ways that lead to a range of phenotypes is unclear. We characterized the effects of human disease-causing variants on CRX -regulatory function by deploying massively parallel reporter assays (MPRAs) in mouse retina explants carrying knock-ins of two variants, one in the DNA-binding domain (p.
View Article and Find Full Text PDF-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome.
View Article and Find Full Text PDFDozens of variants in the photoreceptor-specific transcription factor (TF) CRX are linked with human blinding diseases that vary in their severity and age of onset. It is unclear how different variants in this single TF alter its function in ways that lead to a range of phenotypes. We examined the effects of human disease-causing variants on CRX -regulatory function by deploying massively parallel reporter assays (MPRAs) in live mouse retinas carrying knock-ins of two variants, one in the DNA binding domain (p.
View Article and Find Full Text PDFPost-transcriptional autoregulation of gene expression is common in bacteria but many fewer examples are known in eukaryotes. We used the yeast collection of genes fused to GFP as a rapid screen for examples of feedback regulation in ribosomal proteins by overexpressing a non-regulatable version of a gene and observing the effects on the expression of the GFP-fused version. We tested 95 ribosomal protein genes and found a wide continuum of effects, with 30% showing at least a 3-fold reduction in expression.
View Article and Find Full Text PDFIn embryonic stem cells (ESCs), a core transcription factor (TF) network establishes the gene expression program necessary for pluripotency. To address how interactions between four key TFs contribute to regulation in mouse ESCs, we assayed two massively parallel reporter assay (MPRA) libraries composed of binding sites for SOX2, POU5F1 (OCT4), KLF4, and ESRRB. Comparisons between synthetic -regulatory elements and genomic sequences with comparable binding site configurations revealed some aspects of a regulatory grammar.
View Article and Find Full Text PDFMethylation of CpG (cytosine-phosphate-guanine) dinucleotides is a common epigenetic mark that influences gene expression. The effects of methylation on transcription factor (TF) binding are unknown for most TFs and, even when known, such knowledge is often only qualitative. In reality, methylation sensitivity is a quantitative effect, just as changes to the DNA sequence have quantitative effects on TF binding affinity.
View Article and Find Full Text PDFWhole genome sequencing is a powerful tool in the discovery of single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) among mutant strains, which simplifies forward genetics approaches. However, identification of the causative mutation among a large number of non-causative SNPs in a mutant strain remains a big challenge. In the unicellular biflagellate green alga Chlamydomonas reinhardtii, we generated a SNP/indel library that contains over 2 million polymorphisms from four wild-type strains, one highly polymorphic strain that is frequently used in meiotic mapping, ten mutant strains that have flagellar assembly or motility defects, and one mutant strain, imp3, which has a mating defect.
View Article and Find Full Text PDFCilia are microtubule based organelles that project from cells. Cilia are found on almost every cell type of the human body and numerous diseases, collectively termed ciliopathies, are associated with defects in cilia, including respiratory infections, male infertility, situs inversus, polycystic kidney disease, retinal degeneration, and Bardet-Biedl Syndrome. Here we show that Illumina-based whole-genome transcriptome analysis in the biflagellate green alga Chlamydomonas reinhardtii identifies 1850 genes up-regulated during ciliogenesis, 4392 genes down-regulated, and 4548 genes with no change in expression during ciliogenesis.
View Article and Find Full Text PDFWe employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation.
View Article and Find Full Text PDF