Precise control of gene expression levels is essential for normal cell functions, yet how they are defined and tightly maintained, particularly at intermediate levels, remains elusive. Here, using a series of newly developed sequencing, imaging, and functional assays, we uncover a class of transcription factors with dual roles as activators and repressors, referred to as condensate-forming level-regulating dual-action transcription factors (TFs). They reduce high expression but increase low expression to achieve stable intermediate levels.
View Article and Find Full Text PDFThe three-dimensional genome organization influences diverse nuclear processes. Here we present Chromatin Interaction Predictor (ChIPr), a suite of regression models based on deep neural networks, random forest, and gradient boosting to predict cohesin-mediated chromatin interaction strength between any two loci in the genome. The predictions of ChIPr correlate well with ChIA-PET data in four cell lines.
View Article and Find Full Text PDFProperly integrating spatially resolved transcriptomics (SRT) generated from different batches into a unified gene-spatial coordinate system could enable the construction of a comprehensive spatial transcriptome atlas. Here, we propose SPIRAL, consisting of two consecutive modules: SPIRAL-integration, with graph domain adaptation-based data integration, and SPIRAL-alignment, with cluster-aware optimal transport-based coordination alignment. We verify SPIRAL with both synthetic and real SRT datasets.
View Article and Find Full Text PDFGenomic islands are fragments of foreign DNA that are found in bacterial and archaeal genomes, and are typically associated with symbiosis or pathogenesis. While numerous genomic island detection methods have been proposed, there has been limited evaluation of the efficiency of the genome information processing and boundary recognition tools. In this study, we conducted a review of the statistical methods involved in genomic signatures, host signature extraction, informative signature selection, divergence measures, and boundary detection steps in genomic island prediction.
View Article and Find Full Text PDFAged hematopoietic stem cells (HSCs) exhibit compromised reconstitution capacity. The molecular mechanisms behind this phenomenon are not fully understood. Here, we observed that the expression of FUS is increased in aged HSCs, and enforced FUS recapitulates the phenotype of aged HSCs through arginine-glycine-glycine-mediated aberrant FUS phase transition.
View Article and Find Full Text PDFEnterococcus faecalis is a Gram-positive bacterium that natively colonizes the human gastrointestinal tract and opportunistically causes life-threatening infections. Multidrug-resistant (MDR) E. faecalis strains have emerged that are replete with mobile genetic elements (MGEs).
View Article and Find Full Text PDFAged hematopoietic stem cells (HSC) exhibit compromised reconstitution capacity and differentiation-bias towards myeloid lineage, however, the molecular mechanism behind it remains not fully understood. In this study, we observed that the expression of pseudouridine (Ψ) synthase 10 is increased in aged hematopoietic stem and progenitor cells (HSPC) and enforced protein of Ψ synthase 10 (PUS10) recapitulates the phenotype of aged HSC, which is not achieved by its Ψ synthase activity. Consistently, we observed no difference of transcribed RNA pseudouridylation profile between young and aged HSPC.
View Article and Find Full Text PDFBackground: Gastric cancer is a malignant tumor with high morbidity and mortality. Therefore, the accurate recognition of prognostic molecular markers is the key to improving treatment efficacy and prognosis.
Methods: In this study, we developed a stable and robust signature through a series of processes using machine-learning approaches.
Spatial omics technologies generate wealthy but highly complex datasets. Here we present Spatial Omics DataBase (SODB), a web-based platform providing both rich data resources and a suite of interactive data analytical modules. SODB currently maintains >2,400 experiments from >25 spatial omics technologies, which are freely accessible as a unified data format compatible with various computational packages.
View Article and Find Full Text PDFThe rapidly developing spatial omics generated datasets with diverse scales and modalities. However, most existing methods focus on modeling dynamics of single cells while ignore microenvironments (MEs). Here we present SOTIP (Spatial Omics mulTIPle-task analysis), a versatile method incorporating MEs and their interrelationships into a unified graph.
View Article and Find Full Text PDFRNA species act as architectural scaffolds for nuclear structures including chromatin in eukaryotic cells. However, the composition and dynamics of tightly bound chromatin-associated RNAs during mitosis remains elusive. Here we report the identification of chromatin-enriched RNA (cheRNAs) by biochemical nuclear fractionation coupled with RNA sequencing in both interphase and mitotic phase of A549 and HeLa-S3 cell lines.
View Article and Find Full Text PDFUnderstanding the molecular and cellular mechanisms of human primordial germ cells (hPGCs) is essential in studying infertility and germ cell tumorigenesis. Many RNA-binding proteins (RBPs) and non-coding RNAs are specifically expressed and functional during hPGC developments. However, the roles and regulatory mechanisms of these RBPs and non-coding RNAs, such as microRNAs (miRNAs), in hPGCs remain elusive.
View Article and Find Full Text PDFUnlabelled: Prostate cancer is one of the most heritable human cancers. Genome-wide association studies have identified at least 185 prostate cancer germline risk alleles, most noncoding. We used integrative three-dimensional (3D) spatial genomics to identify the chromatin interaction targets of 45 prostate cancer risk alleles, 31 of which were associated with the transcriptional regulation of target genes in 565 localized prostate tumors.
View Article and Find Full Text PDFCTCF mediates chromatin insulation and long-distance enhancer-promoter (EP) interactions; however, little is known about how these regulatory functions are partitioned among target genes in key biological processes. Here, we show that Ctcf expression is progressively increased during induced pluripotency. In this process, CTCF first functions as a chromatin insulator responsible for direct silencing of the somatic gene expression program and, interestingly, elevated Ctcf expression next ensures chromatin accessibility and contributes to increased EP interactions for a fraction of pluripotency-associated genes.
View Article and Find Full Text PDFDatabase (Oxford)
March 2022
Human papillomavirus (HPV) can cause condyloma acuminatum and cervical cancer. Some mutations of these viruses are closely related to the persistent infection of cervical cancer and are ideal cancer vaccine targets. Several databases have been developed to collect HPV sequences, but no HPV mutation database has been published.
View Article and Find Full Text PDFInnate immunity plays critical antiviral roles. The highly virulent avian influenza viruses (AIVs) H5N1, H7N9, and H5N6 can better escape host innate immune responses than the less virulent seasonal H1N1 virus. Here, we report a mechanism by which transcriptional readthrough (TRT)-mediated suppression of innate immunity occurs post AIV infection.
View Article and Find Full Text PDFThe genome exists as an organized, three-dimensional (3D) dynamic architecture, and each cell type has a unique 3D genome organization that determines its cell identity. An unresolved question is how cell type-specific 3D genome structures are established during development. Here, we analyzed 3D genome structures in muscle cells from mice lacking the muscle lineage transcription factor (TF), MyoD, versus wild-type mice.
View Article and Find Full Text PDFClustering cells and depicting the lineage relationship among cell subpopulations are fundamental tasks in single-cell omics studies. However, existing analytical methods face challenges in stratifying cells, tracking cellular trajectories, and identifying critical points of cell transitions. To overcome these, we proposed a novel Markov hierarchical clustering algorithm (MarkovHC), a topological clustering method that leverages the metastability of exponentially perturbed Markov chains for systematically reconstructing the cellular landscape.
View Article and Find Full Text PDFNucleic Acids Res
January 2022
Recent developments of single cell RNA-sequencing technologies lead to the exponential growth of single cell sequencing datasets across different conditions. Combining these datasets helps to better understand cellular identity and function. However, it is challenging to integrate different datasets from different laboratories or technologies due to batch effect, which are interspersed with biological variances.
View Article and Find Full Text PDFCombinational therapy is used for a long time in cancer treatment to overcome drug resistance related to monotherapy. Increased pharmacological data and the rapid development of deep learning methods have enabled the construction of models to predict and screen drug pairs. However, the size of drug libraries is restricted to hundreds to thousands of compounds.
View Article and Find Full Text PDFSpatial metabolomics can reveal intercellular heterogeneity and tissue organization. Here we report on the spatial single nuclear metabolomics (SEAM) method, a flexible platform combining high-spatial-resolution imaging mass spectrometry and a set of computational algorithms that can display multiscale and multicolor tissue tomography together with identification and clustering of single nuclei by their in situ metabolic fingerprints. We first applied SEAM to a range of wild-type mouse tissues, then delineated a consistent pattern of metabolic zonation in mouse liver.
View Article and Find Full Text PDFDynamic models of gene expression are urgently required. In this paper, we describe the time evolution of gene expression by learning a jump diffusion process to model the biological process directly. Our algorithm needs aggregate gene expression data as input and outputs the parameters of the jump diffusion process.
View Article and Find Full Text PDF