Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters.
View Article and Find Full Text PDFAnalysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting.
View Article and Find Full Text PDFSignalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data.
View Article and Find Full Text PDFGene coexpression network analysis is a powerful "data-driven" approach essential for understanding cancer biology and mechanisms of tumor development. Yet, despite the completion of thousands of studies on cancer gene expression, there have been few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. We generated such a resource by exploring gene coexpression networks in 82 microarray datasets from 9 major human cancer types.
View Article and Find Full Text PDFUsing a three-dimensional coculture model, we identified significant subtype-specific changes in gene expression, metabolic, and therapeutic sensitivity profiles of breast cancer cells in contact with cancer-associated fibroblasts (CAF). CAF-induced gene expression signatures predicted clinical outcome and immune-related differences in the microenvironment. We found that fibroblasts strongly protect carcinoma cells from lapatinib, attributable to its reduced accumulation in carcinoma cells and an elevated apoptotic threshold.
View Article and Find Full Text PDFWe analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant.
View Article and Find Full Text PDFThe term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas.
View Article and Find Full Text PDFThe Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal'ta and Paleo-Eskimo ancient genomes, using original data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We genotyped over 130,000 autosomal SNPs, identified mitochondrial and Y-chromosomal haplogroups, and performed high-coverage genome sequencing of two Ket individuals.
View Article and Find Full Text PDFBackground: The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, shorter than those with homologs (BMC Evol Biol 2:20, 2002).
View Article and Find Full Text PDFBackground: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.
Results: We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays.
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial.
View Article and Find Full Text PDFBackground: Despite a growing number of studies evaluating cancer of prostate (CaP) specific gene alterations, oncogenic activation of the ETS Related Gene (ERG) by gene fusions remains the most validated cancer gene alteration in CaP. Prevalent gene fusions have been described between the ERG gene and promoter upstream sequences of androgen-inducible genes, predominantly TMPRSS2 (transmembrane protease serine 2). Despite the extensive evaluations of ERG genomic rearrangements, fusion transcripts and the ERG oncoprotein, the prognostic value of ERG remains to be better understood.
View Article and Find Full Text PDFRecurrent mutations in histone-modifying enzymes imply key roles in tumorigenesis, yet their functional relevance is largely unknown. Here, we show that JARID1B, encoding a histone H3 lysine 4 (H3K4) demethylase, is frequently amplified and overexpressed in luminal breast tumors and a somatic mutation in a basal-like breast cancer results in the gain of unique chromatin binding and luminal expression and splicing patterns. Downregulation of JARID1B in luminal cells induces basal genes expression and growth arrest, which is rescued by TGFβ pathway inhibitors.
View Article and Find Full Text PDFThe rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats.
View Article and Find Full Text PDFEarly full-term pregnancy is one of the most effective natural protections against breast cancer. To investigate this effect, we have characterized the global gene expression and epigenetic profiles of multiple cell types from normal breast tissue of nulliparous and parous women and carriers of BRCA1 or BRCA2 mutations. We found significant differences in CD44(+) progenitor cells, where the levels of many stem cell-related genes and pathways, including the cell-cycle regulator p27, are lower in parous women without BRCA1/BRCA2 mutations.
View Article and Find Full Text PDFThe discovery of novel drug targets is a significant challenge in drug development. Although the human genome comprises approximately 30,000 genes, proteins encoded by fewer than 400 are used as drug targets in the treatment of diseases. Therefore, novel drug targets are extremely valuable as the source for first in class drugs.
View Article and Find Full Text PDFAs it is the case with any OMICs technology, the value of proteomics data is defined by the degree of its functional interpretation in the context of phenotype. Functional analysis of proteomics profiles is inherently complex, as each of hundreds of detected proteins can belong to dozens of pathways, be connected in different context-specific groups by protein interactions and regulated by a variety of one-step and remote regulators. Knowledge-based approach deals with this complexity by creating a structured database of protein interactions, pathways and protein-disease associations from experimental literature and a set of statistical tools to compare the proteomics profiles with this rich source of accumulated knowledge.
View Article and Find Full Text PDFBackground: There is resurgence within drug and biomarker development communities for the use of primary tumorgraft models as improved predictors of patient tumor response to novel therapeutic strategies. Despite perceived advantages over cell line derived xenograft models, there is limited data comparing the genotype and phenotype of tumorgrafts to the donor patient tumor, limiting the determination of molecular relevance of the tumorgraft model. This report directly compares the genomic characteristics of patient tumors and the derived tumorgraft models, including gene expression, and oncogenic mutation status.
View Article and Find Full Text PDFThe ability to accurately predict the toxicity of drug candidates from their chemical structure is critical for guiding experimental drug discovery toward safer medicines. Under the guidance of the MetaTox consortium (Thomson Reuters, CA, USA), which comprised toxicologists from the pharmaceutical industry and government agencies, we created a comprehensive ontology of toxic pathologies for 19 organs, classifying pathology terms by pathology type and functional organ substructure. By manual annotation of full-text research articles, the ontology was populated with chemical compounds causing specific histopathologies.
View Article and Find Full Text PDFChondrosarcomas are among the most malignant skeletal tumors. Dedifferentiated chondrosarcoma is a highly aggressive subtype of chondrosarcoma, with lung metastases developing within a few months of diagnosis in 90% of patients. In this paper we performed comparative analyses of the transcriptomes of five individual metastatic lung lesions that were surgically resected from a patient with dedifferentiated chondrosarcoma.
View Article and Find Full Text PDFThe molecular events leading to human embryonic stem cell (hESC) differentiation are the subject of considerable scrutiny. Here, we characterize an in vitro model that permits analysis of the earliest steps in the transition of hESC colonies to squamous epithelium on basic fibroblast growth factor withdrawal. A set of markers (GSC, CK18, Gata4, Eomes, and Sox17) point to a mesendodermal nature of the epithelial cells with subsequent commitment to definitive endoderm (Sox17, Cdx2, nestin, and Islet1).
View Article and Find Full Text PDFBackground: Successful drug development has been hampered by a limited understanding of how to translate laboratory-based biological discoveries into safe and effective medicines. We have developed a generic method for predicting the effects of drugs on biological processes. Information derived from the chemical structure and experimental omics data from short-term efficacy studies are combined to predict the possible protein targets and cellular pathways affected by drugs.
View Article and Find Full Text PDFIntratumor heterogeneity is a major clinical problem because tumor cell subtypes display variable sensitivity to therapeutics and may play different roles in progression. We previously characterized 2 cell populations in human breast tumors with distinct properties: CD44+CD24- cells that have stem cell-like characteristics, and CD44-CD24+ cells that resemble more differentiated breast cancer cells. Here we identified 15 genes required for cell growth or proliferation in CD44+CD24- human breast cancer cells in a large-scale loss-of-function screen and found that inhibition of several of these (IL6, PTGIS, HAS1, CXCL3, and PFKFB3) reduced Stat3 activation.
View Article and Find Full Text PDFDifferentiation is an epigenetic program that involves the gradual loss of pluripotency and acquisition of cell type-specific features. Understanding these processes requires genome-wide analysis of epigenetic and gene expression profiles, which have been challenging in primary tissue samples due to limited numbers of cells available. Here we describe the application of high-throughput sequencing technology for profiling histone and DNA methylation, as well as gene expression patterns of normal human mammary progenitor-enriched and luminal lineage-committed cells.
View Article and Find Full Text PDF