CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis.

Gigascience

Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

Published: October 2022

Background: Many organizations face challenges in managing and analyzing data, especially when relevant datasets arise from multiple sources and methods. Analyzing heterogeneous datasets and additional derived data requires rigorous tracking of their interrelationships and provenance. This task has long been a Grand Challenge of data science and has more recently been formalized in the FAIR principles: that all data objects be Findable, Accessible, Interoperable, and Reusable, both for machines and for people. Adherence to these principles is necessary for proper stewardship of information, for testing regulatory compliance, for measuring the efficiency of processes, and for facilitating reuse of data-analytical frameworks.

Findings: We present the Contextual Ontology-based Repository Analysis Library (CORAL), a platform that greatly facilitates adherence to all 4 of the FAIR principles, including the especially difficult challenge of making heterogeneous datasets Interoperable and Reusable across all parts of a large, long-lasting organization. To achieve this, CORAL's data model requires that data generators extensively document the context for all data, and our tools maintain that context throughout the entire analysis pipeline. CORAL also features a web interface for data generators to upload and explore data, as well as a Jupyter notebook interface for data analysts, both backed by a common API.

Conclusions: CORAL enables organizations to build FAIR data types on the fly as they are needed, avoiding the expense of bespoke data modeling. CORAL provides a uniquely powerful platform to enable integrative cross-dataset analyses, generating deeper insights than are possible using traditional analysis tools.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575582PMC
http://dx.doi.org/10.1093/gigascience/giac089DOI Listing

Publication Analysis

Top Keywords

data
14
data modeling
8
heterogeneous datasets
8
fair principles
8
interoperable reusable
8
data generators
8
interface data
8
coral
5
coral framework
4
framework rigorous
4

Similar Publications

Outcomes With Radiation Therapy as Primary Treatment for Unresectable Cutaneous Head and Neck Squamous Cell Carcinoma.

Clin Oncol (R Coll Radiol)

December 2024

Radiation Oncology Network, Westmead Hospital, Westmead, NSW, Australia; Sydney Medical School, The University of Sydney, Camperdown, NSW 2006, Australia. Electronic address:

Aims: Unresectable cutaneous squamous cell cancer of the head and neck (HNcSCC) poses treatment challenges in elderly and comorbid patients. Radiation therapy (RT) is often employed for locoregional control. This study aimed to determine progression-free survival (PFS) and overall survival (OS) outcomes achieved with upfront RT in unresectable HNcSCC.

View Article and Find Full Text PDF

Objective: Discussions related to the importance of seeking specific consent for sensitive (e.g., pelvic, rectal) exams performed on anesthetized patients by medical students have been growing.

View Article and Find Full Text PDF

Who is coming in? Evaluation of physician performance within multi-physician emergency departments.

Am J Emerg Med

January 2025

Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, USA; Center for Outcomes Research and Evaluation, Yale University, New Haven, CT, USA.

Background: This study aimed to examine how physician performance metrics are affected by the speed of other attendings (co-attendings) concurrently staffing the ED.

Methods: A retrospective study was conducted using patient data from two EDs between January-2018 and February-2020. Machine learning was used to predict patient length of stay (LOS) conditional on being assigned a physician of average speed, using patient- and departmental-level variables.

View Article and Find Full Text PDF

National early warning score 2 plus non-invasive capnography and perfusion index to estimate poor outcomes in emergency departments.

Am J Emerg Med

January 2025

Faculty of Medicine, Universidad de Valladolid, Valladolid, Spain; Emergency Department, Hospital Clínico Universitario, Gerencia Regional de Salud de Castilla y León, Valladolid, Spain.

Background: The study of the inclusion of new variables in already existing early warning scores is a growing field. The aim of this work was to determine how capnometry measurements, in the form of end-tidal CO2 (ETCO2) and the perfusion index (PI), could improve the National Early Warning Score (NEWS2).

Methods: A secondary, prospective, multicenter, cohort study was undertaken in adult patients with unselected acute diseases who needed continuous monitoring in the emergency department (ED), involving two tertiary hospitals in Spain from October 1, 2022, to June 30, 2023.

View Article and Find Full Text PDF

Mild cognitive impairment (MCI) is a significant predictor of the early progression of Alzheimer's disease, and it can be used as an important indicator of disease progression. However, many existing methods focus mainly on the image itself when processing brain imaging data, ignoring other non-imaging data (e.g.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!