High dimensional biological data retrieval optimization with NoSQL technology.

Shicai Wang Ioannis Pandis Chao Wu Sijin He David Johnson Ibrahim Emam Florian Guitton Yike Guo

BMC Genomics

Published: July 2015

Background: High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data.

Results: In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB.

Conclusions: The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248814	PMC
http://dx.doi.org/10.1186/1471-2164-15-S8-S3	DOI Listing

Publication Analysis

Top Keywords

data model

data

model implemented

biological data

model

nosql technology

transcriptomic data

microarray data

performance

implemented mysql

Similar Publications

Leveraging molecular dynamics, physicochemical, and structural analysis to explore OMP33-36 protein as a drug target in Acinetobacter baumannii: An approach against nosocomial infection.

J Mol Graph Model

January 2025

Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, Gomtinagar Extension, Lucknow, 226028, India; Research Cell, Amity University Uttar Pradesh, Lucknow Campus, India. Electronic address:

Sukriti Singh Jyotsna Agarwal Anupam Das Mala Trivedi Kshatresh D Dubey

The Acinetobacter baumannii is a member of the "ESKAPE" bacteria responsible for many serious multidrug-resistant (MDR) illnesses. This bacteria swiftly adapts to environmental cues leading to the emergence of multidrug-resistant variants, particularly in hospital/medical settings. In this work, we have demonstrated the outer membrane protein 33-36 (Omp33-36) porin as a potential therapeutic target in A.

View Article and Find Full Text PDF

Similar Publications

Enhancing Activation Energy Predictions under Data Constraints Using Graph Neural Networks.

J Chem Inf Model

January 2025

Department of Chemical Engineering, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan.

Han-Chung Chang Ming-Hsuan Tsai Yi-Pei Li

Accurately predicting activation energies is crucial for understanding chemical reactions and modeling complex reaction systems. However, the high computational cost of quantum chemistry methods often limits the feasibility of large-scale studies, leading to a scarcity of high-quality activation energy data. In this work, we explore and compare three innovative approaches (transfer learning, delta learning, and feature engineering) to enhance the accuracy of activation energy predictions using graph neural networks, specifically focusing on methods that incorporate low-cost, low-level computational data.

View Article and Find Full Text PDF

Similar Publications

Metabolic adaptations to acute glucose uptake inhibition converge upon mitochondrial respiration for leukemia cell survival.

Cell Commun Signal

January 2025

Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, 10029, USA.

Monika Komza Jesminara Khatun Jesse D Gelles Andrew P Trotta Ioana Abraham-Enachescu

One hallmark of cancer is the upregulation and dependency on glucose metabolism to fuel macromolecule biosynthesis and rapid proliferation. Despite significant pre-clinical effort to exploit this pathway, additional mechanistic insights are necessary to prioritize the diversity of metabolic adaptations upon acute loss of glucose metabolism. Here, we investigated a potent small molecule inhibitor to Class I glucose transporters, KL-11743, using glycolytic leukemia cell lines and patient-based model systems.

View Article and Find Full Text PDF

Similar Publications

Association between the hemoglobin A1c/High-density lipoprotein cholesterol ratio and stroke incidence: a prospective nationwide cohort study in China.

Lipids Health Dis

January 2025

Department of Neurosurgery, The Third Affiliated Hospital of Soochow University, Changzhou, Jiangsu, 213000, China.

Chaojuan Huang Hongtao You Yuyang Zhang Ligang Fan Xingliang Feng

Background: Stroke has emerged as an escalating public health challenge among middle-aged and older individuals in China, closely linked to glycolipid metabolic abnormalities. The Hemoglobin A1c/High-Density Lipoprotein Cholesterol (HbA1c/HDL-C) ratio, an integrated marker of glycolipid homeostasis, may serve as a novel predictor of stroke risk.

Methods: Our investigation utilized data from the China Health and Retirement Longitudinal Study cohort (2011-2018).

View Article and Find Full Text PDF

Similar Publications

The effects of unified pooling arrangement on health inequity in China: a DID-RIF approach.

BMC Health Serv Res

January 2025

School of Humanities and Social Sciences, Beihang University, No. 37 Xueyuan Road, Beijing, 100191, China.

Jing Wu Yuqing Liu Chuncheng Wang Lianjie Liu Jiaqian Lu

Background: To address the health inequity caused by decentralized management, China has introduced a provincial pooling system for urban employees' basic medical insurance. This paper proposes a research framework to evaluate similar policies in different contexts. This paper adopts a mixed-methods approach to more comprehensively and precisely capture the causal effects of the policy.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!