Background: This article addresses the problem of interoperation of heterogeneous bioinformatics databases.
Results: We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research.
Conclusion: BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1444936 | PMC |
http://dx.doi.org/10.1186/1471-2105-7-170 | DOI Listing |
J Inflamm Res
December 2024
Department of Nephrology, Blood Purification Research Center, the First Affiliated Hospital, Fujian Medical University, Fuzhou, People's Republic of China.
Objective: A comprehensive bioinformatics analysis was conducted to investigate potential new diagnostic biomarkers and immune infiltration characteristics associated with tubulointerstitial injury in lupus nephritis (LN), and to examine possible correlations between key genes and infiltrating immune cells.
Methods: The GSE32591, GSE113342, and GSE200306 datasets were downloaded from the Gene Expression Omnibus database and differentially expressed genes (DEGs) were identified in the pooled dataset. Support vector machine-recursive feature elimination analysis and the least absolute shrinkage and selection operator regression model were used to screen for possible markers, and the compositional patterns of the 22 types of immune cell fractions in LN were determined using CIBERSORT.
JAMIA Open
February 2025
Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy.
Objectives: In recent years, the rise of big data and artificial intelligence has led to an increasing expansion of databases and web services in biomedical research. cBioPortal is one of the most widely used platforms for accessing cancer genomic and clinical data. The primary objective of this study was to develop a tool that simplifies programmatic interaction with cBioPortal's web service.
View Article and Find Full Text PDFFront Endocrinol (Lausanne)
December 2024
Department of Ultrasound, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China.
Background: Polycystic ovary syndrome (PCOS) is a complex endocrine disorder with various contributing factors. Understanding the molecular mechanisms underlying PCOS is essential for developing effective treatments. This study aimed to identify hub genes and investigate potential molecular mechanisms associated with PCOS through a combination of bioinformatics analysis and Mendelian randomization (MR).
View Article and Find Full Text PDFFront Immunol
December 2024
Molecular Pathology & Genetics Division, Kanagawa Cancer Center Research Institute, Yokohama, Japan.
Background: Studies have shown that tumor cell amino acid metabolism is closely associated with lung adenocarcinoma (LUAD) development and progression. However, the comprehensive multi-omics features and clinical impact of the expression of genes associated with amino acid metabolism in the LUAD tumor microenvironment (TME) are yet to be fully understood.
Methods: LUAD patients from The Cancer Genome Atlas (TCGA) database were enrolled in the training cohort.
Front Immunol
December 2024
Intensive Care Unit, Hubei University of Medicine, Renmin Hospital, Shiyan, Hubei, China.
Background: Sepsis is a life-threatening organ dysfunction condition produced by dysregulation of the host response to infection. It is now characterized by a high clinical morbidity and mortality rate, endangering patients' lives and health. The purpose of this study was to determine the value of Long chain non-coding RNA (LncRNA) RP3_508I15.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!