De novo transcriptome assembly from billions of RNA-seq reads is very challenging due to alternative splicing and various levels of expression, which often leads to incorrect, mis-assembled transcripts. BayesDenovo addresses this problem by using both a read-guided strategy to accurately reconstruct splicing graphs from the RNA-seq data and a Bayesian strategy to estimate, from these graphs, the probability of transcript expression without penalizing poorly expressed transcripts. Simulation and cell line benchmark studies demonstrate that BayesDenovo is very effective in reducing false positives and achieves much higher accuracy than other assemblers, especially for alternatively spliced genes and for highly or poorly expressed transcripts.
View Article and Find Full Text PDFTranscription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules.
View Article and Find Full Text PDFBackground: ChIP-seq combines chromatin immunoprecipitation assays with sequencing and identifies genome-wide binding sites for DNA binding proteins. While many binding sites have strong ChIP-seq 'peak' observations and are well captured, there are still regions bound by proteins weakly, with a relatively low ChIP-seq signal enrichment. These weak binding sites, especially those at promoters and enhancers, are functionally important because they also regulate nearby gene expression.
View Article and Find Full Text PDFExploring complex modularization of intracellular signal transduction pathways is critical to understanding aberrant cellular responses during disease development and drug treatment. IMPALA (Inferred Modularization of PAthway LAndscapes) integrates information from high throughput gene expression experiments and genome-scale knowledge databases to identify aberrant pathway modules, thereby providing a powerful sampling strategy to reconstruct and explore pathway landscapes. Here IMPALA identifies pathway modules associated with breast cancer recurrence and Tamoxifen resistance.
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFMotivation: High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure.
View Article and Find Full Text PDFSomatic inactivating mutations of ARID1A, a SWI/SNF chromatin remodeling gene, are prevalent in human endometrium-related malignancies. To elucidate the mechanisms underlying how ARID1A deleterious mutation contributes to tumorigenesis, we establish genetically engineered murine models with Arid1a and/or Pten conditional deletion in the endometrium. Transcriptomic analyses on endometrial cancers and precursors derived from these mouse models show a close resemblance to human uterine endometrioid carcinomas.
View Article and Find Full Text PDFGenome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites based on inferred cis-regulatory modules (CRMs). CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize.
View Article and Find Full Text PDFBackground: Spleen tyrosine kinase (SYK) is frequently upregulated in recurrent ovarian carcinomas, for which effective therapy is urgently needed. SYK phosphorylates several substrates, but their translational implications remain unclear. Here, we show that SYK interacts with EGFR and ERBB2, and directly enhances their phosphorylation.
View Article and Find Full Text PDFDrawing on concepts from experimental biology, computer science, informatics, mathematics and statistics, systems biologists integrate data across diverse platforms and scales of time and space to create computational and mathematical models of the integrative, holistic functions of living systems. Endocrine-related cancers are well suited to study from a systems perspective because of the signaling complexities arising from the roles of growth factors, hormones and their receptors as critical regulators of cancer cell biology and from the interactions among cancer cells, normal cells and signaling molecules in the tumor microenvironment. Moreover, growth factors, hormones and their receptors are often effective targets for therapeutic intervention, such as estrogen biosynthesis, estrogen receptors or HER2 in breast cancer and androgen receptors in prostate cancer.
View Article and Find Full Text PDFMotivation: NGS techniques have been widely applied in genetic and epigenetic studies. Multiple ChIP-seq and RNA-seq profiles can now be jointly used to infer functional regulatory networks (FRNs). However, existing methods suffer from either oversimplified assumption on transcription factor (TF) regulation or slow convergence of sampling for FRN inference from large-scale ChIP-seq and time-course RNA-seq data.
View Article and Find Full Text PDFMotivation: Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems.
Results: We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data.
Background: Maternal and paternal high-fat (HF) diet intake before and/or during pregnancy increases mammary cancer risk in several preclinical models. We studied if maternal consumption of a HF diet that began at a time when the fetal primordial germ cells travel to the genital ridge and start differentiating into germ cells would result in a transgenerational inheritance of increased mammary cancer risk.
Methods: Pregnant C57BL/6NTac mouse dams were fed either a control AIN93G or isocaloric HF diet composed of corn oil high in n-6 polyunsaturated fatty acids between gestational days 10 and 20.
One of the important tasks in cancer research is to identify biomarkers and build classification models for clinical outcome prediction. In this paper, we develop a CyNetSVM software package, implemented in Java and integrated with Cytoscape as an app, to identify network biomarkers using network-constrained support vector machines (NetSVM). The Cytoscape app of NetSVM is specifically designed to improve the usability of NetSVM with the following enhancements: (1) user-friendly graphical user interface (GUI), (2) computationally efficient core program and (3) convenient network visualization capability.
View Article and Find Full Text PDFMotivation: Whole genome DNA-sequencing (WGS) of paired tumor and normal samples has enabled the identification of somatic DNA changes in an unprecedented detail. Large-scale identification of somatic structural variations (SVs) for a specific cancer type will deepen our understanding of driver mechanisms in cancer progression. However, the limited number of WGS samples, insufficient read coverage, and the impurity of tumor samples that contain normal and neoplastic cells, limit reliable and accurate detection of somatic SVs.
View Article and Find Full Text PDFMotivation: The advent of high-throughput DNA methylation profiling techniques has enabled the possibility of accurate identification of differentially methylated genes for cancer research. The large number of measured loci facilitates whole genome methylation study, yet posing great challenges for differential methylation detection due to the high variability in tumor samples.
Results: We have developed a novel probabilistic approach, D: ifferential M: ethylation detection using a hierarchical B: ayesian model exploiting L: ocal D: ependency (DM-BLD), to detect differentially methylated genes based on a Bayesian framework.
Chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) has greatly improved the reliability with which transcription factor binding sites (TFBSs) can be identified from genome-wide profiling studies. Many computational tools are developed to detect binding events or peaks, however the robust detection of weak binding events remains a challenge for current peak calling tools. We have developed a novel Bayesian approach (ChIP-BIT) to reliably detect TFBSs and their target genes by jointly modeling binding signal intensities and binding locations of TFBSs.
View Article and Find Full Text PDFPurpose: Statins are among the most frequently prescribed drugs because of their efficacy and low toxicity in treating hypercholesterolemia. Recently, statins have been reported to inhibit the proliferative activity of cancer cells, especially those with TP53 mutations. Because TP53 mutations occur in almost all ovarian high-grade serous carcinoma (HGSC), we determined whether statins suppressed tumor growth in animal models of ovarian cancer.
View Article and Find Full Text PDFBackground: Identification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often fail to model the dependency between genes in the network, which makes many important genes, especially the upstream genes, unidentified.
View Article and Find Full Text PDFScope: Soy flour diet (MS) prevented isoflavones from stimulating MCF-7 tumor growth in athymic nude mice, indicating that other bioactive compounds in soy can negate the estrogenic properties of isoflavones. The underlying signal transduction pathways to explain the protective effects of soy flour consumption were studied here.
Methods And Results: Ovariectomized athymic nude mice inoculated with MCF-7 human breast cancer cells were fed either Soy flour diet (MS) or purified isoflavone mix diet (MI), both with equivalent amounts of genistein.
Unlabelled: Identification of protein interaction subnetworks is an important step to help us understand complex molecular mechanisms in cancer. In this paper, we develop a BMRF-Net package, implemented in Java and C++, to identify protein interaction subnetworks based on a bagging Markov random field (BMRF) framework. By integrating gene expression data and protein-protein interaction data, this software tool can be used to identify biologically meaningful subnetworks.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
December 2015
High coverage whole genome DNA-sequencing enables identification of somatic structural variation (SSV) more evident in paired tumor and normal samples. Recent studies show that simultaneous analysis of paired samples provides a better resolution of SSV detection than subtracting shared SVs. However, available tools can neither identify all types of SSVs nor provide any rank information regarding their somatic features.
View Article and Find Full Text PDFUnlabelled: We have developed an integrated molecular network learning method, within a well-grounded mathematical framework, to construct differential dependency networks with significant rewiring. This knowledge-fused differential dependency networks (KDDN) method, implemented as a Java Cytoscape app, can be used to optimally integrate prior biological knowledge with measured data to simultaneously construct both common and differential networks, to quantitatively assign model parameters and significant rewiring p-values and to provide user-friendly graphical results. The KDDN algorithm is computationally efficient and provides users with parallel computing capability using ubiquitous multi-core machines.
View Article and Find Full Text PDFBackground: Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data.
Results: We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation.