Combinatorial binding of transcription factors to regulatory DNA underpins gene regulation in all organisms. Genetic variation in regulatory regions has been connected with diseases and diverse phenotypic traits, but it remains challenging to distinguish variants that affect regulatory function. Genomic DNase I footprinting enables the quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin.
View Article and Find Full Text PDFThe human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development.
View Article and Find Full Text PDFDNase I hypersensitive sites (DHSs) are generic markers of regulatory DNA and contain genetic variations associated with diseases and phenotypic traits. We created high-resolution maps of DHSs from 733 human biosamples encompassing 438 cell and tissue types and states, and integrated these to delineate and numerically index approximately 3.6 million DHSs within the human genome sequence, providing a common coordinate system for regulatory DNA.
View Article and Find Full Text PDFBackground: Transcriptional dysregulation drives cancer formation but the underlying mechanisms are still poorly understood. Renal cell carcinoma (RCC) is the most common malignant kidney tumor which canonically activates the hypoxia-inducible transcription factor (HIF) pathway. Despite intensive study, novel therapeutic strategies to target RCC have been difficult to develop.
View Article and Find Full Text PDFBackground: Linking genetic risk loci identified by genome-wide association studies (GWAS) to their causal genes remains a major challenge. Disease-associated genetic variants are concentrated in regions containing regulatory DNA elements, such as promoters and enhancers. Although researchers have previously published DNA maps of these regulatory regions for kidney tubule cells and glomerular endothelial cells, maps for podocytes and mesangial cells have not been available.
View Article and Find Full Text PDFStructural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging. Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines.
View Article and Find Full Text PDFThe function of human regulatory regions depends exquisitely on their local genomic environment and on cellular context, complicating experimental analysis of common disease- and trait-associated variants that localize within regulatory DNA. We use allelically resolved genomic DNase I footprinting data encompassing 166 individuals and 114 cell types to identify >60,000 common variants that directly influence transcription factor occupancy and regulatory DNA accessibility in vivo. The unprecedented scale of these data enables systematic analysis of the impact of sequence variation on transcription factor occupancy in vivo.
View Article and Find Full Text PDFThe reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
View Article and Find Full Text PDFTo study the evolutionary dynamics of regulatory DNA, we mapped >1.3 million deoxyribonuclease I-hypersensitive sites (DHSs) in 45 mouse cell and tissue types, and systematically compared these with human DHS maps from orthologous compartments. We found that the mouse and human genomes have undergone extensive cis-regulatory rewiring that combines branch-specific evolutionary innovation and loss with widespread repurposing of conserved DHSs to alternative cell fates, and that this process is mediated by turnover of transcription factor (TF) recognition elements.
View Article and Find Full Text PDFThe basic body plan and major physiological axes have been highly conserved during mammalian evolution, yet only a small fraction of the human genome sequence appears to be subject to evolutionary constraint. To quantify cis- versus trans-acting contributions to mammalian regulatory evolution, we performed genomic DNase I footprinting of the mouse genome across 25 cell and tissue types, collectively defining ∼8.6 million transcription factor (TF) occupancy sites at nucleotide resolution.
View Article and Find Full Text PDFThe laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization.
View Article and Find Full Text PDFPseudomonas aeruginosa can develop resistance to polymyxin as a consequence of mutations in the PhoPQ regulatory system, mediated by covalent lipid A modification. Transposon mutagenesis of a polymyxin-resistant phoQ mutant defined 41 novel loci required for resistance, including two regulatory systems, ColRS and CprRS. Deletion of the colRS genes, individually or in tandem, abrogated the polymyxin resistance of a ΔphoQ mutant, as did individual or tandem deletion of cprRS.
View Article and Find Full Text PDFTwo groups independently sequenced the Agrobacterium tumefaciens C58 genome in 2001. We report here consolidation of these sequences, updated annotation, and additional analysis of the evolutionary history of the linear chromosome, which is apparently limited to the biovar I group of Agrobacterium.
View Article and Find Full Text PDFHere we report the complete, accurate 1.89-Mb genome sequence of Francisella tularensis subsp. holarctica strain FSC200, isolated in 1998 in the Swedish municipality Ljusdal, which is in an area where tularemia is highly endemic.
View Article and Find Full Text PDFCTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environments. Here we analyze genome-wide occupancy patterns of CTCF by ChIP-seq in 19 diverse human cell types, including normal primary cells and immortal lines.
View Article and Find Full Text PDFGenome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes.
View Article and Find Full Text PDFRegulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements.
View Article and Find Full Text PDFDNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.
View Article and Find Full Text PDFTo complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
View Article and Find Full Text PDFBurkholderia pseudomallei, the etiologic agent of human melioidosis, is capable of causing severe acute infection with overwhelming septicemia leading to death. A high rate of recurrent disease occurs in adult patients, most often due to recrudescence of the initial infecting strain. Pathogen persistence and evolution during such relapsing infections are not well understood.
View Article and Find Full Text PDFThe genome sequence of the aceticlastic methanoarchaeon Methanosaeta concilii GP6, comprised of a 3,008,626-bp chromosome and an 18,019-bp episome, has been determined and exhibits considerable differences in gene content from that of Methanosaeta thermophila.
View Article and Find Full Text PDFIn Mus spretus, the chloride channel 4 gene Clcn4-2 is X-linked and dosage compensated by X up-regulation and X inactivation, while in the closely related mouse species Mus musculus, Clcn4-2 has been translocated to chromosome 7. We sequenced Clcn4-2 in M. spretus and identified the breakpoints of the evolutionary translocation in the Mus lineage.
View Article and Find Full Text PDFUnderstanding the prevailing mutational mechanisms responsible for human genome structural variation requires uniformity in the discovery of allelic variants and precision in terms of breakpoint delineation. We develop a resource based on capillary end sequencing of 13.8 million fosmid clones from 17 human genomes and characterize the complete sequence of 1054 large structural variants corresponding to 589 deletions, 384 insertions, and 81 inversions.
View Article and Find Full Text PDF