Towards self-describing and FAIR bulk formats for biomedical data.

PLoS Comput Biol

Center for Translational Data Science, University of Chicago, Chicago, Illinois, United States of America.

Published: March 2023

We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035862PMC
http://dx.doi.org/10.1371/journal.pcbi.1010944DOI Listing

Publication Analysis

Top Keywords

biomedical data
16
data
10
bulk biomedical
8
portable format
8
format biomedical
8
data dictionary
8
third party
8
party controlled
8
pfb files
8
biomedical
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!