XML schemas for common bioinformatic data types and their application in workflow systems.

Philipp N Seibel Jan Krüger Sven Hartmeier Knut Schwarzer Kai Löwenthal Henning Mersch Thomas Dandekar Robert Giegerich

BMC Bioinformatics

Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.

Published: November 2006

Background: Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats.

Results: Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net.

Conclusion: The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2001303	PMC
http://dx.doi.org/10.1186/1471-2105-7-490	DOI Listing

Publication Analysis

Top Keywords

xml schemas

data types

bioinformatic tools

rna secondary

secondary structure

biodom library

tools

xml

schemas common

common bioinformatic

Similar Publications

Dataset on aquatic ecotoxicity predictions of 2697 chemicals, using three quantitative structure-activity relationship platforms.

Data Brief

December 2023

Department of Biological and Environmental Sciences, University of Gothenburg, PO Box 463, SE-405 30 Gothenburg, Sweden.

Patrik Svedberg Pedro A Inostroza Mikael Gustavsson Erik Kristiansson Francis Spilsbury

Empirical and data on the aquatic ecotoxicology of 2697 organic chemicals were collected in order to compile a dataset for assessing the predictive power of current Quantitative Structure Activity Relationship (QSAR) models and software platforms. This document presents the dataset and the data pipeline for its creation. Empirical data were collected from the US EPA ECOTOX Knowledgebase (ECOTOX) and the EFSA (European Food Safety Authority) report "Completion of data entry of pesticide ecotoxicology Tier 1 study endpoints in a XML schema - database".

View Article and Find Full Text PDF

Similar Publications

Shape Expressions (ShEx) schemas for the FHIR R5 specification.

J Biomed Inform

December 2023

Mayo Clinic, Rochester, MN, USA.

Deepak K Sharma Eric Prud'hommeaux David Booth Claude Nanjo Guoqian Jiang

This work continues along a visionary path of using Semantic Web standards such as RDF and ShEx to make healthcare data easier to integrate for research and leading-edge patient care. The work extends the ability to use ShEx schemas to validate FHIR RDF data, thereby enhancing the semantic web ecosystem for working with FHIR and non-FHIR data using the same ShEx validation framework. It updates FHIR's ShEx schemas to fix outstanding issues and reflect changes in the definition of FHIR RDF.

View Article and Find Full Text PDF

Similar Publications

Towards improved FAIRness of the ThermoML Archive.

J Comput Chem

May 2022

Thermodynamics Research Center, Applied Chemicals and Materials Division, National Institute of Standards and Technology, Boulder, CO, USA.

Demian Riccardi Zachary Trautt Ala Bazyleva Eugene Paulechka Vladimir Diky

The ThermoML Archive is a subset of Thermodynamics Research Center (TRC) data holdings corresponding to cooperation between NIST TRC and five journals: Journal of Chemical Engineering and Data (ISSN: 1520-5134), The Journal of Chemical Thermodynamics (ISSN: 1096-3626), Fluid Phase Equilibria (ISSN: 0378-3812), Thermochimica Acta (ISSN: 0040-6031), and International Journal of Thermophysics (ISSN: 1572-9567). Data from initial cooperation (around 2003) through the 2019 calendar year are included. The archive has undergone a major update with the goal of improving the FAIRness and user experience of the data provided by the service.

View Article and Find Full Text PDF

Similar Publications

Efficient processing of complex XSD using Hive and Spark.

PeerJ Comput Sci

August 2021

Department of Software and Computing Systems, University of Alicante, Alicante, Spain.

Diana Martinez-Mosquera Rosa Navarrete Sergio Luján-Mora

The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark.

View Article and Find Full Text PDF

Similar Publications

COVID term: a bilingual terminology for COVID-19.

BMC Med Inform Decis Mak

August 2021

Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China.

Hetong Ma Liu Shen Haixia Sun Zidu Xu Li Hou

Background: The coronavirus disease (COVID-19), a pneumonia caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has shown its destructiveness with more than one million confirmed cases and dozens of thousands of death, which is highly contagious and still spreading globally. World-wide studies have been conducted aiming to understand the COVID-19 mechanism, transmission, clinical features, etc. A cross-language terminology of COVID-19 is essential for improving knowledge sharing and scientific discovery dissemination.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!