Background: Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats.
Results: Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net.
Conclusion: The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2001303 | PMC |
http://dx.doi.org/10.1186/1471-2105-7-490 | DOI Listing |
Data Brief
December 2023
Department of Biological and Environmental Sciences, University of Gothenburg, PO Box 463, SE-405 30 Gothenburg, Sweden.
Empirical and data on the aquatic ecotoxicology of 2697 organic chemicals were collected in order to compile a dataset for assessing the predictive power of current Quantitative Structure Activity Relationship (QSAR) models and software platforms. This document presents the dataset and the data pipeline for its creation. Empirical data were collected from the US EPA ECOTOX Knowledgebase (ECOTOX) and the EFSA (European Food Safety Authority) report "Completion of data entry of pesticide ecotoxicology Tier 1 study endpoints in a XML schema - database".
View Article and Find Full Text PDFJ Biomed Inform
December 2023
Mayo Clinic, Rochester, MN, USA.
This work continues along a visionary path of using Semantic Web standards such as RDF and ShEx to make healthcare data easier to integrate for research and leading-edge patient care. The work extends the ability to use ShEx schemas to validate FHIR RDF data, thereby enhancing the semantic web ecosystem for working with FHIR and non-FHIR data using the same ShEx validation framework. It updates FHIR's ShEx schemas to fix outstanding issues and reflect changes in the definition of FHIR RDF.
View Article and Find Full Text PDFJ Comput Chem
May 2022
Thermodynamics Research Center, Applied Chemicals and Materials Division, National Institute of Standards and Technology, Boulder, CO, USA.
The ThermoML Archive is a subset of Thermodynamics Research Center (TRC) data holdings corresponding to cooperation between NIST TRC and five journals: Journal of Chemical Engineering and Data (ISSN: 1520-5134), The Journal of Chemical Thermodynamics (ISSN: 1096-3626), Fluid Phase Equilibria (ISSN: 0378-3812), Thermochimica Acta (ISSN: 0040-6031), and International Journal of Thermophysics (ISSN: 1572-9567). Data from initial cooperation (around 2003) through the 2019 calendar year are included. The archive has undergone a major update with the goal of improving the FAIRness and user experience of the data provided by the service.
View Article and Find Full Text PDFPeerJ Comput Sci
August 2021
Department of Software and Computing Systems, University of Alicante, Alicante, Spain.
The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark.
View Article and Find Full Text PDFBMC Med Inform Decis Mak
August 2021
Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China.
Background: The coronavirus disease (COVID-19), a pneumonia caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has shown its destructiveness with more than one million confirmed cases and dozens of thousands of death, which is highly contagious and still spreading globally. World-wide studies have been conducted aiming to understand the COVID-19 mechanism, transmission, clinical features, etc. A cross-language terminology of COVID-19 is essential for improving knowledge sharing and scientific discovery dissemination.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!