Ultra-Structure database design methodology for managing systems biology data and analyses.

BMC Bioinformatics

Department of Microbiology and Immunology, UNC Chapel Hill, NC, USA.

Published: August 2009

Background: Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping).

Results: We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research.

Conclusion: We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2748085PMC
http://dx.doi.org/10.1186/1471-2105-10-254DOI Listing

Publication Analysis

Top Keywords

data sets
16
systems biology
12
data
10
design methodology
8
ultra-structure system
8
domain knowledge
8
computer code
8
ultra-structure
7
systems
5
sets
5

Similar Publications

Deep learning is a double-edged sword. The powerful feature learning ability of deep models can effectively improve classification accuracy. Still, when the training samples for each class are limited, it will not only face the problem of overfitting but also significantly affect the classification result.

View Article and Find Full Text PDF

The cabbage aphid, Brevicoryne brassicae, is a major pest on Brassicaceae plants, causing significant yield losses annually. However, the lack of genomic resources has hindered progress in understanding this pest at the molecular level. Here, we present a high-quality, chromosomal-level genome assembly for B.

View Article and Find Full Text PDF

A simple model for the analysis of epidemics based on hospitalization data.

Math Biosci

January 2025

Department of Mathematics, University of Illinois Urbana-Champaign, Urbana, IL, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA. Electronic address:

An epidemiological model with a minimal number of parameters is introduced and its structural and practical identifiabity is investigated both analytically and numerically. The model is useful when a high percentage of unreported cases is suspected, hence only hospitalization data are used to fit the model parameters and calculate the basic reproductive number R and the effective reproductive number R. As a case study, the model is used to study the initial surge and the Omicron wave of the COVID-19 epidemic in Belgium.

View Article and Find Full Text PDF

Background: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models' outputs. As a standard, categorical data, such as patients' gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population.

View Article and Find Full Text PDF

Identification of potential drug-target interactions (DTIs) is a crucial step in drug discovery and repurposing. Although deep learning effectively deciphers DTIs, most deep learning-based methods represent drug features from only a single perspective. Moreover, the fusion method of drug and protein features needs further refinement.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!