Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4629038 | PMC |
http://dx.doi.org/10.1155/2015/502795 | DOI Listing |
BMC Bioinformatics
December 2024
Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, Padua, Italy.
Background: Vaccines development in this millennium started by the milestone work on Neisseria meningitidis B, reporting the invention of Reverse Vaccinology (RV), which allows to identify vaccine candidates (VCs) by screening bacterial pathogens genome or proteome through computational analyses. When NERVE (New Enhanced RV Environment), the first RV software integrating tools to perform the selection of VCs, was released, it prompted further development in the field. However, the problem-solving potential of most, if not all, RV programs is still largely unexploited by experimental vaccinologists that impaired by somehow difficult interfaces, requiring bioinformatic skills.
View Article and Find Full Text PDFSci Rep
November 2024
Pivot Bio, Sheboygan, WI, USA.
Field trials are one of the essential stages in agricultural product development, enabling the validation of products in real-world environments rather than controlled laboratory or greenhouse settings. With the advancement in technologies, field trials often collect a large amount of information with diverse data types from various sources. Managing and organizing extensive datasets can impose challenges for small research teams, especially with constantly evolving data collection processes with multiple collaborators and introducing new data types between studies.
View Article and Find Full Text PDFBrief Bioinform
September 2024
Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg.
Graph databases are becoming increasingly popular across scientific disciplines, being highly suitable for storing and connecting complex heterogeneous data. In systems biology, they are used as a backend solution for biological data repositories, ontologies, networks, pathways, and knowledge graph databases. In this review, we analyse all publications using or mentioning graph databases retrieved from PubMed and PubMed Central full-text search, focusing on the top 16 available graph databases, Publications are categorized according to their domain and application, focusing on pathway and network biology and relevant ontologies and tools.
View Article and Find Full Text PDFRev Esp Cir Ortop Traumatol
November 2024
Servicio de Cirugía Ortopédica y Traumatología, Hospital Universitario Galdakao-Usansolo, Galdakao, Bizkaia, Spain.
Background And Objective: The objective is to develop a model that predicts vital status six months after fracture as accurately as possible. For this purpose we will use five different data sources obtained through the National Hip Fracture Registry, the Health Management Unit and the Economic Management Department.
Material And Methods: The study population is a cohort of patients over 74 years of age who suffered a hip fracture between May 2020 and December 2022.
Rev Sci Instrum
November 2024
General Atomics, San Diego, California 92121, USA.
Many current and upcoming laser facilities used to study high-energy-density (HED) physics and inertial fusion energy (IFE) support operating at high rep-rates (HRRs) of ∼0.1-10 Hz, yet many diagnostics, target-fielding strategies, and data storage methods cannot support this pace of operation. Therefore, established experimental paradigms must change for the community to progress toward rep-rated operation.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!