Background: Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain.

Methods: The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider).

Results: BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection.

Conclusion: Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353896PMC
http://dx.doi.org/10.1186/s13326-017-0118-0DOI Listing

Publication Analysis

Top Keywords

data
19
life sciences
16
federated query
12
sparql endpoints
12
query
9
query processing
8
query federation
8
sparql queries
8
data resources
8
life science
8

Similar Publications

Triple-negative breast cancer (TNBC) remains a significant global health challenge, emphasizing the need for precise identification of patients with specific therapeutic targets and those at high risk of metastasis. This study aimed to identify novel therapeutic targets for personalized treatment of TNBC patients by elucidating their roles in cell cycle regulation. Using weighted gene co-expression network analysis (WGCNA), we identified 83 hub genes by integrating gene expression profiles with clinical pathological grades.

View Article and Find Full Text PDF

Objectives: Cardiac biomarkers are useful for the diagnostic and prognostic assessment of myocardial injury (MI) and heart failure. By measuring specific proteins released into the bloodstream during heart stress or damage, these biomarkers help clinicians detect the presence and extent of heart injury and tailor appropriate treatment plans. This study aims to provide robust biological variation (BV) data for cardiac biomarkers in athletes, specifically focusing on those applied to detect or exclude MI, such as myoglobin, creatine kinase-myocardial band (CK-MB) and cardiac troponins (cTn), and those related to heart failure and cardiac dysfunction, brain natriuretic peptide (BNP) and N-terminal brain natriuretic pro-peptide (NT-proBNP).

View Article and Find Full Text PDF

Background: Major mutations (e.g., KRAS, GNAS, TP53, SMAD4) in pancreatic cyst fluid (PCF) are useful for classifying and risk stratifying certain cyst types, particularly in cases with nondiagnostic cytology.

View Article and Find Full Text PDF

High degree of variability in human leukocyte antigens (HLAs) system restricts availability of histocompatible HLA-matched-related donors, thus increasing reliance on worldwide bone marrow registries network. Nevertheless, due to limited coverage/accessibility/affordability of some ethnicities in these registries, haploidentical haematopoietic stem cell transplantation (HSCT) emerged as an alternative option, though with allorecognition-mediated graft versus host disease (GvHD) (>40% cases). A dimorphism [-21 methionine (M) or threonine (T)] in HLA-B leader peptide (exon 1) which differentially influences its HLA-E binding, plausibly regulates natural killer cell functionality, affecting GvHD vulnerability and clinically in practice for donor selection.

View Article and Find Full Text PDF

Background: Metastatic castration resistance prostate cancer (mCRPC) is a challenging disease with a significant burden of mortality and morbidity. Most of the patients attain resistance to the available treatments, necessitating further novel therapies in this clinical setting. Actinium 225 (Ac) prostate-specific membrane antigen (PSMA) radioligand therapy has emerged as a promising option and has been utilized for the last decade.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!