Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types.
View Article and Find Full Text PDFThe success of machine learning models relies heavily on effectively representing high-dimensional data. However, ensuring data representations capture human-understandable concepts remains difficult, often requiring the incorporation of prior knowledge and decomposition of data into multiple subspaces. Traditional linear methods fall short in modeling more than one space, while more expressive deep learning approaches lack interpretability.
View Article and Find Full Text PDFBackground: Glioblastoma (GB) remains a formidable challenge in neuro-oncology, with immune checkpoint blockade (ICB) showing limited efficacy in unselected patients. We previously recently established that MAPK/ERK signaling is associated with overall survival following anti-PD-1 and anti-CTLA-4 treatment in recurrent GB. However, the causal relationship between MAPK/ERK signaling and susceptibility to ICB, as well as the mechanisms underlying this association, remain poorly understood.
View Article and Find Full Text PDFAberrant DNA methylation patterns have been used for cancer detection. However, DNA hemi-methylation, present at about 10% CpG dinucleotides, has been less well studied. Here we show that a majority of differentially hemi-methylated regions (DHMRs) in liver tumor DNA or plasma cells free (cf) DNA do not overlap with differentially methylated regions (DMRs) of the same samples, indicating that DHMRs could serve as independent biomarkers.
View Article and Find Full Text PDFTranscriptional regulation, involving the complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate in unseen cell types and conditions. Here, we introduce GET, an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types.
View Article and Find Full Text PDFThe rapid advancement of sequencing technologies has led to the identification of numerous mutations in cancer genomes, many of which are variants of unknown significance (VUS). Computational models are increasingly being used to predict the functional impact of these mutations, in both coding and noncoding regions. Integration of these models with emerging genomic datasets will refine our understanding of mutation effects and guide clinical decision making.
View Article and Find Full Text PDFUnlabelled: The p53 tumor suppressor protein, a sequence-specific DNA binding transcription factor, regulates the expression of a large number of genes, in response to various forms of cellular stress. Although the protein coding target genes of p53 have been well studied, less is known about its role in regulating long noncoding genes and their functional relevance to cancer. Here we report the genome-wide identification of a large set (>1,000) of long noncoding RNAs (lncRNA), which are putative p53 targets in a colon cancer cell line and in human patient datasets from five different common types of cancer.
View Article and Find Full Text PDFInt J Radiat Oncol Biol Phys
July 2024
Purpose: Diffuse midline glioma (DMG) is a fatal tumor traditionally treated with radiation therapy (RT) and previously characterized as having a noninflammatory tumor immune microenvironment (TIME). FLASH is a novel RT technique using ultra-high dose rate that is associated with decreased toxicity and effective tumor control. However, the effect of FLASH and conventional (CONV) RT on the DMG TIME has not yet been explored.
View Article and Find Full Text PDFRecent advances in single-cell RNA-sequencing (scRNA-seq) technology have facilitated studies of cell states and plasticity in tissue maintenance and cancer, including in the prostate. Here we present meta-analyses of multiple new and published scRNA-seq datasets to establish reference cell type classifications for the normal mouse and human prostate. Our analyses demonstrate transcriptomic similarities between epithelial cell states in the normal prostate, in the regressed prostate after androgen-deprivation, and in primary prostate tumors.
View Article and Find Full Text PDFViral respiratory infections are an important public health concern due to their prevalence, transmissibility, and potential to cause serious disease. Disease severity is the product of several factors beyond the presence of the infectious agent, including specific host immune responses, host genetic makeup, and bacterial coinfections. To understand these interactions within natural infections, we designed a longitudinal cohort study actively surveilling respiratory viruses over the course of 19 months (2016 to 2018) in a diverse cohort in New York City.
View Article and Find Full Text PDFPurpose: The CXCL12-CXCR4 chemokine axis plays a significant role in modulating T-cell infiltration into the pancreatic tumor microenvironment. Despite promising preclinical findings, clinical trials combining inhibitors of CXCR4 (AMD3100/BL-8040) and anti-programmed death 1/ligand1 (anti-PD1/PD-L1) have failed to improve outcomes.
Experimental Design: We utilized a novel ex vivo autologous patient-derived immune/organoid (PDIO) co-culture system using human peripheral blood mononuclear cells and patient derived tumor organoids, and in vivo the autochthonous LSL-KrasG12D/+; LSL-Trp53R172H/+; Pdx-1-Cre (KPC) pancreatic cancer mouse model to interrogate the effects of either monotherapy or all combinations of gemcitabine, AMD3100, and anit-PD1 on CD8+ T cell activation and survival.
Spatial omics technologies can help identify spatially organized biological processes, but existing computational approaches often overlook structural dependencies in the data. Here, we introduce Smoother, a unified framework that integrates positional information into non-spatial models via modular priors and losses. In simulated and real datasets, Smoother enables accurate data imputation, cell-type deconvolution, and dimensionality reduction with remarkable efficiency.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
November 2023
The NF-κB family of transcription factors and the Ras family of small GTPases are important mediators of proproliferative signaling that drives tumorigenesis and carcinogenesis. The κB-Ras proteins were previously shown to inhibit both NF-κB and Ras activation through independent mechanisms, implicating them as tumor suppressors with potentially broad relevance to human cancers. In this study, we have used two mouse models to establish the relevance of the κB-Ras proteins for tumorigenesis.
View Article and Find Full Text PDFAdults and children afflicted with the 22q11.2 deletion syndrome (22q11.2DS) exhibit cognitive, social, and emotional impairments, and are at significantly heightened risk for schizophrenia (SCZ).
View Article and Find Full Text PDFInsertions and deletions (indels) are common sources of structural variation, and insertions originating from spontaneous DNA lesions are frequent in cancer. We developed a highly sensitive assay called insertion and deletion sequencing (Indel-seq) to monitor rearrangements in human cells at the TRIM37 acceptor locus that reports indels stemming from experimentally induced and spontaneous genome instability. Templated insertions, which derive from sequences genome wide, require contact between donor and acceptor loci, require homologous recombination, and are stimulated by DNA end-processing.
View Article and Find Full Text PDFThe ATR kinase, which coordinates cellular responses to DNA replication stress, is also essential for the proliferation of normal unstressed cells. Although its role in the replication stress response is well defined, the mechanisms by which ATR supports normal cell proliferation remain elusive. Here, we show that ATR is dispensable for the viability of G0-arrested naïve B cells.
View Article and Find Full Text PDFThe ATR kinase, which coordinates cellular responses to DNA replication stress, is also essential for the proliferation of normal unstressed cells. Although its role in the replication stress response is well defined, the mechanisms by which ATR supports normal cell proliferation remain elusive. Here, we show that ATR is dispensable for the viability of G0-arrested naïve B cells.
View Article and Find Full Text PDFDNA transposable elements and transposase-derived genes are present in most living organisms, including vertebrates, but their function is largely unknown. PiggyBac Transposable Element Derived 5 (PGBD5) is an evolutionarily conserved vertebrate DNA transposase-derived gene with retained nuclease activity in human cells. Vertebrate brain development is known to be associated with prominent neuronal cell death and DNA breaks, but their causes and functions are not well understood.
View Article and Find Full Text PDFBackground: Adenoid cystic carcinoma (ACC) is a lethal malignancy of exocrine glands, characterized by the coexistence within tumor tissues of 2 distinct populations of cancer cells, phenotypically similar to the myoepithelial and ductal lineages of normal salivary epithelia. The developmental relationship linking these 2 cell types, and their differential vulnerability to antitumor treatments, remains unknown.
Methods: Using single-cell RNA sequencing, we identified cell-surface markers (CD49f, KIT) that enabled the differential purification of myoepithelial-like (CD49fhigh/KITneg) and ductal-like (CD49flow/KIT+) cells from patient-derived xenografts (PDXs) of human ACCs.
Motivation: Here, we performed a benchmarking analysis of five tools for microbe sequence detection using transcriptomics data (Kraken2, MetaPhlAn2, PathSeq, DRAC and Pandora). We built a synthetic database mimicking real-world structure with tuned conditions accounting for microbe species prevalence, base calling quality and sequence length. Sensitivity and positive predictive value (PPV) parameters, as well as computational requirements, were used for tool ranking.
View Article and Find Full Text PDF