Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic.
View Article and Find Full Text PDFInferring gene regulatory networks from single-cell RNA-sequencing trajectories has been an active area of research yet methods are still needed to identify regulators governing cell transitions. We developed DREAMIT (Dynamic Regulation of Expression Across Modules in Inferred Trajectories) to annotate transcription-factor activity along single-cell trajectory branches, using ensembles of relations to target genes. Using a benchmark representing several different tissues, as well as external validation with ATAC-Seq and Perturb-Seq data on hematopoietic cells, the method was found to have higher tissue-specific sensitivity and specificity over competing approaches.
View Article and Find Full Text PDFThe cellular components of tumors and their microenvironment play pivotal roles in tumor progression, patient survival, and the response to cancer treatments. Unveiling a comprehensive cellular profile within bulk tumors via single-cell RNA sequencing (scRNA-seq) data is crucial, as it unveils intrinsic tumor cellular traits that elude identification through conventional cancer subtyping methods. Our contribution, scBeacon, is a tool that derives cell-type signatures by integrating and clustering multiple scRNA-seq datasets to extract signatures for deconvolving unrelated tumor datasets on bulk samples.
View Article and Find Full Text PDFHuman pluripotent stem cell-derived tissue engineering offers great promise in designer cell-based personalized therapeutics. To harness such potential, a broader approach requires a deeper understanding of tissue-level interactions. We previously developed a manufacturing system for the ectoderm-derived skin epithelium for cell replacement therapy.
View Article and Find Full Text PDFThe SARS-CoV-2 pandemic has challenged humankind's ability to quickly determine the cascade of health effects caused by a novel infection. Even with the unprecedented speed at which vaccines were developed and introduced into society, identifying therapeutic interventions and drug targets for patients infected with the virus remains important as new strains of the virus evolve, or future coronaviruses may emerge that are resistant to current vaccines. The application of transcriptomic RNA sequencing of infected samples may shed new light on the pathways involved in viral mechanisms and host responses.
View Article and Find Full Text PDFThe characteristic ionic currents of nucleotide kmers are commonly used in analyzing nanopore sequencing readouts. We present a graph convolutional network-based deep learning framework for predicting kmer characteristic ionic currents from corresponding chemical structures. We show such a framework can generalize the chemical information of the 5-methyl group from thymine to cytosine by correctly predicting 5-methylcytosine-containing DNA 6mers, thus shedding light on the de novo detection of nucleotide modifications.
View Article and Find Full Text PDFDeep learning architectures such as variational autoencoders have revolutionized the analysis of transcriptomics data. However, the latent space of these variational autoencoders offers little to no interpretability. To provide further biological insights, we introduce a novel sparse Variational Autoencoder architecture, VEGA (VAE Enhanced by Gene Annotations), whose decoder wiring mirrors user-provided gene modules, providing direct interpretability to the latent variables.
View Article and Find Full Text PDFAdvancements in sequencing have led to the proliferation of multi-omic profiles of human cells under different conditions and perturbations. In addition, many databases have amassed information about pathways and gene "signatures"-patterns of gene expression associated with specific cellular and phenotypic contexts. An important current challenge in systems biology is to leverage such knowledge about gene coordination to maximize the predictive power and generalization of models applied to high-throughput datasets.
View Article and Find Full Text PDFBiological states are controlled by orchestrated transcriptional factors (TFs) within gene regulatory networks. Here we show TFs responsible for the dynamic changes of biological states can be prioritized with temporal PageRank. We further show such TF prioritization can be extended by integrating gene regulatory networks reverse engineered from multi-omics profiles, e.
View Article and Find Full Text PDFPurpose: The purpose of this study was to measure genomic changes that emerge with enzalutamide treatment using analyses of whole-genome sequencing and RNA sequencing.
Experimental Design: One hundred and one tumors from men with metastatic castration-resistant prostate cancer (mCRPC) who had not been treated with enzalutamide ( = 64) or who had enzalutamide-resistant mCRPC ( = 37) underwent whole genome sequencing. Ninety-nine of these tumors also underwent RNA sequencing.
Objectives: The net oncogenic effect of β2-adrenergic receptor ADRB2, whose downstream elements induce neuroendocrine differentiation and whose expression is regulated by EZH2, is unclear. ADRB2 expression and associated clinical outcomes in metastatic castration-resistant prostate cancer (mCRPC) are unknown.
Methods And Materials: This was a retrospective analysis of a multi-center, prospectively enrolled cohort of mCRPC patients.
The androgen receptor (AR) antagonist enzalutamide is one of the principal treatments for men with castration-resistant prostate cancer (CRPC). However, not all patients respond, and resistance mechanisms are largely unknown. We hypothesized that genomic and transcriptional features from metastatic CRPC biopsies prior to treatment would be predictive of de novo treatment resistance.
View Article and Find Full Text PDFBackground: Metastatic disease burden out of proportion to serum PSA has been used as a marker of aggressive phenotype prostate cancer but is not well defined as a distinct subgroup. We sought to prospectively characterize the molecular features and clinical outcomes of Low PSA Secretors.
Methods: Eligible metastatic castration resistant prostate cancer (mCRPC) patients without prior small cell histology underwent metastatic tumor biopsy with molecular characterization.
JCO Clin Cancer Inform
February 2020
Purpose: The analysis of cancer biology data involves extremely heterogeneous data sets, including information from RNA sequencing, genome-wide copy number, DNA methylation data reporting on epigenetic regulation, somatic mutations from whole-exome or whole-genome analyses, pathology estimates from imaging sections or subtyping, drug response or other treatment outcomes, and various other clinical and phenotypic measurements. Bringing these different resources into a common framework, with a data model that allows for complex relationships as well as dense vectors of features, will unlock integrated data set analysis.
Methods: We introduce the BioMedical Evidence Graph (BMEG), a graph database and query engine for discovery and analysis of cancer biology.
The discovery of drivers of cancer has traditionally focused on protein-coding genes. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods.
View Article and Find Full Text PDFThe catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes.
View Article and Find Full Text PDFTumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking.
View Article and Find Full Text PDFCancer genome projects have produced multidimensional datasets on thousands of samples. Yet, depending on the tumor type, 5-50% of samples have no known driving event. We introduce a semi-supervised method called Learning UnRealized Events (LURE) that uses a progressive label learning framework and minimum spanning analysis to predict cancer drivers based on their altered samples sharing a gene expression signature with the samples of a known event.
View Article and Find Full Text PDFThe maintenance and transition of cellular states are controlled by biological processes. Here we present a gene set-based transformation of single cell RNA-Seq data into biological process activities that provides a robust description of cellular states. Moreover, as these activities represent species-independent descriptors, they facilitate the alignment of single cell states across different organisms.
View Article and Find Full Text PDF