PeerJ Comput Sci
April 2024
FAIR Digital Object (FDO) is an emerging concept that is highlighted by European Open Science Cloud (EOSC) as a potential candidate for building an ecosystem of machine-actionable research outputs. In this work we systematically evaluate FDO and its implementations as a global distributed object system, by using five different conceptual frameworks that cover interoperability, middleware, FAIR principles, EOSC requirements and FDO guidelines themself. We compare the FDO approach with established Linked Data practices and the existing Web architecture, and provide a brief history of the Semantic Web while discussing why these technologies may have been difficult to adopt for FDO purposes.
View Article and Find Full Text PDFBiological science produces "big data" in varied formats, which necessitates using computational tools to process, integrate, and analyse data. Researchers using computational biology tools range from those using computers for communication, to those writing analysis code. We examine differences in how researchers conceptualise the same data, which we call "subjective data models".
View Article and Find Full Text PDFBackground: The Findable, Accessible, Interoperable and Reusable(FAIR) Principles explicitly require the use of FAIR vocabularies, but what precisely constitutes a FAIR vocabulary remains unclear. Being able to define FAIR vocabularies, identify features of FAIR vocabularies, and provide assessment approaches against the features can guide the development of vocabularies.
Results: We differentiate data, data resources and vocabularies used for FAIR, examine the application of the FAIR Principles to vocabularies, align their requirements with the Open Biomedical Ontologies principles, and propose FAIR Vocabulary Features.
The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with several major public-private partnership projects, demonstrating and delivering improvements across all aspects of FAIR and across a variety of datasets and their contexts.
View Article and Find Full Text PDFMany trainers and organizations are passionate about sharing their training material. Sharing training material has several benefits, such as providing a record of recognition as an author, offering inspiration to other trainers, enabling researchers to discover training resources for their personal learning path, and improving the training resource landscape using data-driven gap analysis from the bioinformatics community. In this article, we present a series of protocols for using the ELIXIR online training registry Training eSupport System (TeSS).
View Article and Find Full Text PDFIn this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR's future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities.
View Article and Find Full Text PDFThe FAIR (findable, accessible, interoperable and reusable) principles are data management and stewardship guidelines aimed at increasing the effective use of scientific research data. Adherence to these principles in managing data assets in pharmaceutical research and development (R&D) offers pharmaceutical companies the potential to maximise the value of such assets, but the endeavour is costly and challenging. We describe the 'FAIR-Decide' framework, which aims to guide decision-making on the retrospective FAIRification of existing datasets by using business analysis techniques to estimate costs and expected benefits.
View Article and Find Full Text PDFDespite the intuitive value of adopting the Findable, Accessible, Interoperable, and Reusable (FAIR) principles in both academic and industrial sectors, challenges exist in resourcing, balancing long- versus short-term priorities, and achieving technical implementation. This situation is exacerbated by the unclear mechanisms by which costs and benefits can be assessed when decisions on FAIR are made. Scientific and research and development (R&D) leadership need reliable evidence of the potential benefits and information on effective implementation mechanisms and remediating strategies.
View Article and Find Full Text PDFScientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus.
View Article and Find Full Text PDFWe need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources.
View Article and Find Full Text PDFArtificial Intelligence (AI) is increasingly used within plant science, yet it is far from being routinely and effectively implemented in this domain. Particularly relevant to the development of novel food and agricultural technologies is the development of validated, meaningful and usable ways to integrate, compare and visualise large, multi-dimensional datasets from different sources and scientific approaches. After a brief summary of the reasons for the interest in data science and AI within plant science, the paper identifies and discusses eight key challenges in data management that must be addressed to further unlock the potential of AI in crop and agronomic research, and particularly the application of Machine Learning (AI) which holds much promise for this domain.
View Article and Find Full Text PDFMotivation: Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers.
View Article and Find Full Text PDFSummary: Dispersed across the Internet is an abundance of disparate, disconnected training information, making it hard for researchers to find training opportunities that are relevant to them. To address this issue, we have developed a new platform-TeSS-which aggregates geographically distributed information and presents it in a central, feature-rich portal. Data are gathered automatically from content providers via bespoke scripts.
View Article and Find Full Text PDFBackground: The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.
View Article and Find Full Text PDFComputational systems biology involves integrating heterogeneous datasets in order to generate models. These models can assist with understanding and prediction of biological phenomena. Generating datasets and integrating them into models involves a wide range of scientific expertise.
View Article and Find Full Text PDFIn the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and complexity of the analyzed systems make simulations both complementary and compatible with other bioinformatics disciplines. However, the characteristics of the software packages used for simulation have prevented the adoption of the technologies accepted in other bioinformatics fields like automated deployment systems, workflow orchestration, or the use of software containers.
View Article and Find Full Text PDFA personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods.
View Article and Find Full Text PDFThe microbial production of fine chemicals provides a promising biosustainable manufacturing solution that has led to the successful production of a growing catalog of natural products and high-value chemicals. However, development at industrial levels has been hindered by the large resource investments required. Here we present an integrated Design-Build-Test-Learn (DBTL) pipeline for the discovery and optimization of biosynthetic pathways, which is designed to be compound agnostic and automated throughout.
View Article and Find Full Text PDF