Publications by authors named "Carl de Boer"

The regulatory mechanisms of silencers have remained poorly understood. In this issue, Hofbauer et al. conduct a genome-wide screen in Drosophila melanogaster and reveal three silencer types that appear to work alone-without the need for combinatorial action, traditional chromatin marks, or open chromatin regions.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers found that genetic variants linked to autoimmune diseases are often located in areas that regulate gene activity in CD4 T cells, impacting disease risk through gene regulation changes.
  • They analyzed over 18,000 variants associated with autoimmune diseases and identified 545 that influence gene expression, showing a strong connection to causal variants.
  • The study demonstrates that these variants work through common regulatory pathways and that they affect gene networks crucial for T cell activation and proliferation, offering insights into how they may contribute to autoimmune disease risk.
View Article and Find Full Text PDF
Article Synopsis
  • A systematic evaluation is necessary to understand how different model architectures and training strategies affect the performance of genomics models, prompting the organization of a DREAM Challenge.
  • In the challenge, competitors used a vast dataset of yeast DNA sequences and expression levels to train models, with the best models employing various neural network architectures and training approaches.
  • The development of the Prix Fixe framework allowed for an in-depth analysis of these models, leading to improved performance, and demonstrating that top models not only excelled on yeast data but also outperformed existing benchmarks in Drosophila and human datasets.
View Article and Find Full Text PDF

DNA libraries are critical components of many biological assays. These libraries are often kept in plasmids that are amplified in to generate sufficient material for an experiment. Library uniformity is critical for ensuring that every element in the library is tested similarly and is thought to be influenced by the culture approach used during library amplification.

View Article and Find Full Text PDF

Signaling pathways that drive gene expression are typically depicted as having a dozen or so landmark phosphorylation and transcriptional events. In reality, thousands of dynamic post-translational modifications (PTMs) orchestrate nearly every cellular function, and we lack technologies to find causal links between these vast biochemical pathways and genetic circuits at scale. Here we describe the high-throughput, functional assessment of phosphorylation sites through the development of PTM-centric base editing coupled to phenotypic screens, directed by temporally resolved phosphoproteomics.

View Article and Find Full Text PDF

Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans.

View Article and Find Full Text PDF

Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression.

View Article and Find Full Text PDF

Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code.

View Article and Find Full Text PDF

Signaling pathways that drive gene expression are typically depicted as having a dozen or so landmark phosphorylation and transcriptional events. In reality, thousands of dynamic post-translational modifications (PTMs) orchestrate nearly every cellular function, and we lack technologies to find causal links between these vast biochemical pathways and genetic circuits at scale. Here, we describe "signaling-to-transcription network" mapping through the development of PTM-centric base editing coupled to phenotypic screens, directed by temporally-resolved phosphoproteomics.

View Article and Find Full Text PDF

Motivation: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar.

Results: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.

View Article and Find Full Text PDF

Summary: Generate Indexes for Libraries (GIL) is a software tool for generating primers to be used in the production of multiplexed sequencing libraries. GIL can be customized in numerous ways to meet user specifications, including length, sequencing modality, color balancing, and compatibility with existing primers, and produces ordering and demultiplexing-ready outputs.

Availability And Implementation: GIL is written in Python and is freely available on GitHub under the MIT license: https://github.

View Article and Find Full Text PDF

Genome-wide association studies (GWASs) have uncovered hundreds of autoimmune disease-associated loci; however, the causal genetic variants within each locus are mostly unknown. Here, we perform high-throughput allele-specific reporter assays to prioritize disease-associated variants for five autoimmune diseases. By examining variants that both promote allele-specific reporter expression and are located in accessible chromatin, we identify 60 putatively causal variants that enrich for statistically fine-mapped variants by up to 57.

View Article and Find Full Text PDF

Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution.

View Article and Find Full Text PDF

Background: FCGR2A binds antibody-antigen complexes to regulate the abundance of circulating and deposited complexes along with downstream immune and autoimmune responses. Although the abundance of FCRG2A may be critical in immune-mediated diseases, little is known about whether its surface expression is regulated through cis genomic elements and non-coding variants. In the current study, we aimed to characterize the regulation of FCGR2A expression, the impact of genetic variation and its association with autoimmune disease.

View Article and Find Full Text PDF

Genome-wide association studies of Systemic Lupus Erythematosus (SLE) nominate 3073 genetic variants at 91 risk loci. To systematically screen these variants for allelic transcriptional enhancer activity, we construct a massively parallel reporter assay (MPRA) library comprising 12,396 DNA oligonucleotides containing the genomic context around every allele of each SLE variant. Transfection into the Epstein-Barr virus-transformed B cell line GM12878 reveals 482 variants with enhancer activity, with 51 variants showing genotype-dependent (allelic) enhancer activity at 27 risk loci.

View Article and Find Full Text PDF

Improved methods are needed to model CRISPR screen data for interrogation of genetic elements that alter reporter gene expression readout. We create MAUDE (Mean Alterations Using Discrete Expression) for quantifying the impact of guide RNAs on a target gene's expression in a pooled, sorting-based expression screen. MAUDE quantifies guide-level effects by modeling the distribution of cells across sorting expression bins.

View Article and Find Full Text PDF

Genome-wide association studies have associated thousands of genetic variants with complex traits and diseases, but pinpointing the causal variant(s) among those in tight linkage disequilibrium with each associated variant remains a major challenge. Here, we use seven experimental assays to characterize all common variants at the multiple disease-associated TNFAIP3 locus in five disease-relevant immune cell lines, based on a set of features related to regulatory potential. Trait/disease-associated variants are enriched among SNPs prioritized based on either: (1) residing within CRISPRi-sensitive regulatory regions, or (2) localizing in a chromatin accessible region while displaying allele-specific reporter activity.

View Article and Find Full Text PDF

How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites.

View Article and Find Full Text PDF

Long-term hematopoietic stem cells (LT-HSCs) maintain hematopoietic output throughout an animal's lifespan. However, with age, the balance is disrupted, and LT-HSCs produce a myeloid-biased output, resulting in poor immune responses to infectious challenge and the development of myeloid leukemias. Here, we show that young and aged LT-HSCs respond differently to inflammatory stress, such that aged LT-HSCs produce a cell-intrinsic, myeloid-biased expression program.

View Article and Find Full Text PDF

Treatment of cancer has been revolutionized by immune checkpoint blockade therapies. Despite the high rate of response in advanced melanoma, the majority of patients succumb to disease. To identify factors associated with success or failure of checkpoint therapy, we profiled transcriptomes of 16,291 individual immune cells from 48 tumor samples of melanoma patients treated with checkpoint inhibitors.

View Article and Find Full Text PDF

Background: Variation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin by K-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark.

Results: BROCKMAN represents each sample as a vector of epigenomic-mark-associated DNA word frequencies, and decomposes the resulting matrix to find hidden structure in the data, followed by unsupervised grouping of samples and identification of the TFs that distinguish groups.

View Article and Find Full Text PDF

Leukemia stem cells (LSCs) have the capacity to self-renew and propagate disease upon serial transplantation in animal models, and elimination of this cell population is required for curative therapies. Here, we describe a series of pooled, in vivo RNAi screens to identify essential transcription factors (TFs) in a murine model of acute myeloid leukemia (AML) with genetically and phenotypically defined LSCs. These screens reveal the heterodimeric, circadian rhythm TFs Clock and Bmal1 as genes required for the growth of AML cells in vitro and in vivo.

View Article and Find Full Text PDF