BioWarehouse: a bioinformatics database warehouse toolkit.

BMC Bioinformatics

Bioinformatics Research Group, SRI International, Menlo Park, USA.

Published: March 2006

Background: This article addresses the problem of interoperation of heterogeneous bioinformatics databases.

Results: We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research.

Conclusion: BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1444936PMC
http://dx.doi.org/10.1186/1471-2105-7-170DOI Listing

Publication Analysis

Top Keywords

bioinformatics database
8
relational database
8
database integration
8
enzyme activities
8
database
7
biowarehouse
6
bioinformatics
5
databases
5
biowarehouse bioinformatics
4
database warehouse
4

Similar Publications

Objective: A comprehensive bioinformatics analysis was conducted to investigate potential new diagnostic biomarkers and immune infiltration characteristics associated with tubulointerstitial injury in lupus nephritis (LN), and to examine possible correlations between key genes and infiltrating immune cells.

Methods: The GSE32591, GSE113342, and GSE200306 datasets were downloaded from the Gene Expression Omnibus database and differentially expressed genes (DEGs) were identified in the pooled dataset. Support vector machine-recursive feature elimination analysis and the least absolute shrinkage and selection operator regression model were used to screen for possible markers, and the compositional patterns of the 22 types of immune cell fractions in LN were determined using CIBERSORT.

View Article and Find Full Text PDF

Objectives: In recent years, the rise of big data and artificial intelligence has led to an increasing expansion of databases and web services in biomedical research. cBioPortal is one of the most widely used platforms for accessing cancer genomic and clinical data. The primary objective of this study was to develop a tool that simplifies programmatic interaction with cBioPortal's web service.

View Article and Find Full Text PDF

Background: Polycystic ovary syndrome (PCOS) is a complex endocrine disorder with various contributing factors. Understanding the molecular mechanisms underlying PCOS is essential for developing effective treatments. This study aimed to identify hub genes and investigate potential molecular mechanisms associated with PCOS through a combination of bioinformatics analysis and Mendelian randomization (MR).

View Article and Find Full Text PDF

Background: Studies have shown that tumor cell amino acid metabolism is closely associated with lung adenocarcinoma (LUAD) development and progression. However, the comprehensive multi-omics features and clinical impact of the expression of genes associated with amino acid metabolism in the LUAD tumor microenvironment (TME) are yet to be fully understood.

Methods: LUAD patients from The Cancer Genome Atlas (TCGA) database were enrolled in the training cohort.

View Article and Find Full Text PDF

Background: Sepsis is a life-threatening organ dysfunction condition produced by dysregulation of the host response to infection. It is now characterized by a high clinical morbidity and mortality rate, endangering patients' lives and health. The purpose of this study was to determine the value of Long chain non-coding RNA (LncRNA) RP3_508I15.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!