Molecular biomarkers measure discrete components of biological processes that can contribute to disorders when impaired. Great interest exists in discovering early cancer biomarkers to improve outcomes. Biomarkers represented in a standardized data model, integrated with multi-omics data, may improve the understanding and use of novel biomarkers such as glycans and glycoconjugates.
View Article and Find Full Text PDFPurpose: The purpose of OncoMX knowledgebase development was to integrate cancer biomarker and relevant data types into a meta-portal, enabling the research of cancer biomarkers side by side with other pertinent multidimensional data types.
Methods: Cancer mutation, cancer differential expression, cancer expression specificity, healthy gene expression from human and mouse, literature mining for cancer mutation and cancer expression, and biomarker data were integrated, unified by relevant biomedical ontologies, and subjected to rule-based automated quality control before ingestion into the database.
Results: OncoMX provides integrated data encompassing more than 1,000 unique biomarker entries (939 from the Early Detection Research Network [EDRN] and 96 from the US Food and Drug Administration) mapped to 20,576 genes that have either mutation or differential expression in cancer.
microRNAs (miRNAs) functioning in gene silencing have been associated with cancer progression. However, common abnormal miRNA expression patterns and their potential roles in cancer have not yet been evaluated. To account for individual differences between patients, we retrieved miRNA sequencing data for 575 patients with both tumor and adjacent non-tumorous tissues from 14 cancer types from The Cancer Genome Atlas (TCGA).
View Article and Find Full Text PDFThe use of large datasets has become ubiquitous in biomedical sciences. Researchers in the field of cancer genomics have, in recent years, generated large volumes of data from their experiments. Those responsible for production of this data often analyze a narrow subset of this data based on the research question they are trying to address: this is the case whether or not they are acting independently or in conjunction with a large-scale cancer genomics project.
View Article and Find Full Text PDFSingle-nucleotide variation and gene expression of disease samples represent important resources for biomarker discovery. Many databases have been built to host and make available such data to the community, but these databases are frequently limited in scope and/or content. BioMuta, a database of cancer-associated single-nucleotide variations, and BioXpress, a database of cancer-associated differentially expressed genes and microRNAs, differ from other disease-associated variation and expression databases primarily through the aggregation of data across many studies into a single source with a unified representation and annotation of functional attributes.
View Article and Find Full Text PDFGene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease.
View Article and Find Full Text PDFDespite availability of sequence site-specific information resulting from years of sequencing and sequence feature curation, there have been few efforts to integrate and annotate this information. In this study, we update the number of human N-linked glycosylation sequons (NLGs), and we investigate cancer-relatedness of glycosylation-impacting somatic nonsynonymous single-nucleotide variation (nsSNV) by mapping human NLGs to cancer variation data and reporting the expected loss or gain of glycosylation sequon. We find 75.
View Article and Find Full Text PDFUnlabelled: Advances in high-throughput sequencing (HTS) technologies have greatly increased the availability of genomic data and potential discovery of clinically significant genomic variants. However, numerous issues still exist with the analysis of these data, including data complexity, the absence of formally agreed upon best practices, and inconsistent reproducibility. Toward a more robust and reproducible variant-calling paradigm, we propose a series of selective noise filtrations and post-alignment quality control (QC) techniques that may reduce the rate of false variant calls.
View Article and Find Full Text PDFPost-translational modifications (PTMs) are covalent modifications that proteins might undergo following or sometimes during the process of translation. Together with gene diversity, PTMs contribute to the overall variety of possible protein function for a given organism. Single-nucleotide polymorphisms (SNPs) are the most common form of variations found in the human genome, and have been found to be associated with diseases like Alzheimer's disease (AD) and Parkinson's disease (PD), among many others.
View Article and Find Full Text PDFThe High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes.
View Article and Find Full Text PDFDatabase (Oxford)
September 2015
BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. The BioXpress database includes expression data from 64 cancer types, 6361 patients and 17 469 genes with 9513 of the genes displaying differential expression between tumor and normal samples. In addition to data directly retrieved from RNA-seq data repositories, manual biocuration of publications supplements the available cancer association annotations in the database.
View Article and Find Full Text PDFIdentification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations.
View Article and Find Full Text PDFUnlabelled: Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner.
View Article and Find Full Text PDFBackground: We have previously suggested a method for proteome wide analysis of variation at functional residues wherein we identified the set of all human genes with nonsynonymous single nucleotide variation (nsSNV) in the active site residue of the corresponding proteins. 34 of these proteins were shown to have a 1:1:1 enzyme:pathway:reaction relationship, making these proteins ideal candidates for laboratory validation through creation and observation of specific yeast active site knock-outs and downstream targeted metabolomics experiments. Here we present the next step in the workflow toward using yeast metabolic modeling to predict human metabolic behavior resulting from nsSNV.
View Article and Find Full Text PDFAn enzyme's active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Nonsynonymous single-nucleotide variations (nsSNVs), which alter the amino acid sequence, are one type of disruption that can alter the active site. When this occurs, it is assumed that enzyme activity will vary because of the criticality of the site to normal protein function.
View Article and Find Full Text PDF