While context-type-specific regulation of genes is largely determined by cis-regulatory regions, attempts to identify cell type-specific eQTLs are complicated by the nested nature of cell types. We present hierarchical eQTL (H-eQTL), a network-based model for hierarchical annotation of bulk-derived eQTLs to levels of a cell type tree using single-cell chromatin accessibility data and no clustering of cells into discrete cell types. Using our model, we annotate bulk-derived eQTLs from the developing brain with high specificity to levels of a cell type hierarchy, which allows sensitive detection of genes with multiple distinct non-coding elements regulating their expression in different cell types.
View Article and Find Full Text PDFThree-dimensional genome organization plays a critical role in gene regulation, and disruptions can lead to developmental disorders by altering the contact between genes and their distal regulatory elements. Structural variants (SVs) can disturb local genome organization, such as the merging of topologically associating domains upon boundary deletion. Testing large numbers of SVs experimentally for their effects on chromatin structure and gene expression is time and cost prohibitive.
View Article and Find Full Text PDFThe dynamic three-dimensional (3D) organization of the human genome (the "4D Nucleome") is closely linked to genome function. Here, we integrate a wide variety of genomic data generated by the 4D Nucleome Project to provide a detailed view of human 3D genome organization in widely used embryonic stem cells (H1-hESCs) and immortalized fibroblasts (HFFc6). We provide extensive benchmarking of 3D genome mapping assays and integrate these diverse datasets to annotate spatial genomic features across scales.
View Article and Find Full Text PDFChronic inflammation and tissue fibrosis are common responses that worsen organ function, yet the molecular mechanisms governing their cross-talk are poorly understood. In diseased organs, stress-induced gene expression changes fuel maladaptive cell state transitions and pathological interaction between cellular compartments. Although chronic fibroblast activation worsens dysfunction in the lungs, liver, kidneys and heart, and exacerbates many cancers, the stress-sensing mechanisms initiating transcriptional activation of fibroblasts are poorly understood.
View Article and Find Full Text PDFOxytocin receptor (Oxtr) signaling influences complex social behaviors in diverse species, including social monogamy in prairie voles. How Oxtr regulates specific components of social attachment behaviors and the neural mechanisms mediating them remains unknown. Here, we examine prairie voles lacking Oxtr and demonstrate that pair bonding comprises distinct behavioral modules: the preference for a bonded partner, and the rejection of novel potential mates.
View Article and Find Full Text PDFThe 3D structure of the genome is an important mediator of gene expression. As phenotypic divergence is largely driven by gene regulatory variation, comparing genome 3D contacts across species can further understanding of the molecular basis of species differences. However, while experimental data on genome 3D contacts in humans are increasingly abundant, only a handful of 3D genome contact maps exist for other species.
View Article and Find Full Text PDFThe genetic diversity of the gut microbiota has a central role in host health. Here, we created pangenomes for 728 human gut prokaryotic species, quadrupling the genes of strain-specific genomes. Each of these species has a core set of a thousand genes, differing even between closely related species, and an accessory set of genes unique to the different strains.
View Article and Find Full Text PDFFecal microbial transplantation (FMT) offers promise for treating ulcerative colitis (UC), though the mechanisms underlying treatment failure are unknown. This study harnessed longitudinally collected colonic biopsies (n = 38) and fecal samples (n = 179) from 19 adults with mild-to-moderate UC undergoing serial FMT in which antimicrobial pre-treatment and delivery mode (capsules versus enema) were assessed for clinical response (≥ 3 points decrease from the pre-treatment Mayo score). Colonic biopsies underwent dual RNA-Seq; fecal samples underwent parallel 16S rRNA and shotgun metagenomic sequencing as well as untargeted metabolomic analyses.
View Article and Find Full Text PDFRecent studies have highlighted the impact of both transcription and transcripts on 3D genome organization, particularly its dynamics. Here, we propose a deep learning framework, called AkitaR, that leverages both genome sequences and genome-wide RNA-DNA interactions to investigate the roles of chromatin-associated RNAs (caRNAs) on genome folding in HFFc6 cells. In order to disentangle the cis- and trans-regulatory roles of caRNAs, we have compared models with nascent transcripts, trans-located caRNAs, open chromatin data, or DNA sequence alone.
View Article and Find Full Text PDFThe evolution of the modern human brain was accompanied by distinct molecular and cellular specializations, which underpin our diverse cognitive abilities but also increase our susceptibility to neurological diseases. These features, some specific to humans and others shared with related species, manifest during different stages of brain development. In this multi-stage process, neural stem cells proliferate to produce a large and diverse progenitor pool, giving rise to excitatory or inhibitory neurons that integrate into circuits during further maturation.
View Article and Find Full Text PDFCellWalker2 is a graph diffusion-based method for single-cell genomics data integration. It extends the CellWalker model by incorporating hierarchical relationships between cell types, providing estimates of statistical significance, and adding data structures for analyzing multi-omics data so that gene expression and open chromatin can be jointly modeled. Our open-source software enables users to annotate cells using existing ontologies and to probabilistically match cell types between two or more contexts, including across species.
View Article and Find Full Text PDFSummary: The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments.
View Article and Find Full Text PDFNucleotide changes in gene regulatory elements are important determinants of neuronal development and diseases. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 open chromatin regions, including thousands of sequences with cell type-specific accessibility and variants associated with brain gene regulation. In primary cells, we identified 46,802 active enhancer sequences and 164 variants that alter enhancer activity.
View Article and Find Full Text PDFNeuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts.
View Article and Find Full Text PDFUnlabelled: Ticks are increasingly important vectors of human and agricultural diseases. While many studies have focused on tick-borne bacteria, far less is known about tick-associated viruses and their roles in public health or tick physiology. To address this, we investigated patterns of bacterial and viral communities across two field populations of western black-legged ticks ().
View Article and Find Full Text PDFThe most prevalent microbial eukaryote in the human gut is , an obligate commensal protist also common in many other vertebrates. is descended from free-living stramenopile ancestors; how it has adapted to thrive within humans and a wide range of hosts is unclear. Here, we cultivated six strains spanning the diversity of the genus and generated highly contiguous, annotated genomes with long-read DNA-seq, Hi-C, and RNA-seq.
View Article and Find Full Text PDFThe investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components.
View Article and Find Full Text PDFRecent studies have highlighted the impact of both transcription and transcripts on 3D genome organization, particularly its dynamics. Here, we propose a deep learning framework, called AkitaR, that leverages both genome sequences and genome-wide RNA-DNA interactions to investigate the roles of chromatin-associated RNAs (caRNAs) on genome folding in HFFc6 cells. In order to disentangle the - and -regulatory roles of caRNAs, we compared models with nascent transcripts, -located caRNAs, open chromatin data, or DNA sequence alone.
View Article and Find Full Text PDF