Facilitating Cancer Epidemiologic Efforts in Cleveland via Creation of Longitudinal De-Duplicated Patient Data Sets.

Cancer Epidemiol Biomarkers Prev

Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio.

Published: April 2020

Background: Cleveland, Ohio, is home to three major hospital systems serving approximately 80% of the Northeast Ohio population. The Cleveland Clinic, University Hospitals Health System, and MetroHealth are direct competitors for primary and specialty care, and patient overlap between these systems is high. Fragmentation of health data that exist in silos at these health systems produces an overestimation of disease burden due to double and sometimes triple counting of patients. As a result, longitudinal population-based studies across the Cleveland patient population are impeded unless accurate and actionable clinically derived health data sets can be created.

Methods: The Cleveland Institute for Computational Biology has developed the De-Duplicate and De-Identify Research Engine (DeDeRE) that, without any exchange of personal health identifiers (PHI) between health systems, will effectively de-duplicate the patients between one or more health entities.

Results: The immediate utility of this software for cancer epidemiology is the increased accuracy in measuring cancer burden and the potential to perform longitudinal studies with de-duplicated, de-identified data sets.

Conclusions: The DeDeRE software developed and tested here accomplishes its goals without exposing PHIs using a state-of-the-art, trusted privacy preservation network enabled by a hash-based matching algorithm.

Impact: This paper will guide the reader through the functions currently developed in DeDeRE and how a healthcare organization (HCO) employing the release version of this technology can begin sharing data with one or more additional HCOs in a collaborative and noncompetitive manner to create a regional population health resource for cancer researchers.

Download full-text PDF

Source
http://dx.doi.org/10.1158/1055-9965.EPI-19-0815DOI Listing

Publication Analysis

Top Keywords

data sets
8
health
8
health data
8
health systems
8
cleveland
5
data
5
facilitating cancer
4
cancer epidemiologic
4
epidemiologic efforts
4
efforts cleveland
4

Similar Publications

Background: The systemic immune-inflammation index (SII) is an emerging marker of inflammation, and the onset of psoriasis is associated with inflammation. The aim of our study was to investigate the potential impact of SII on the incidence rate of adult psoriasis.

Methods: We conducted a cross-sectional study based on the National Health and Nutrition Examination Survey (NHANES) 2011-2014 data sets.

View Article and Find Full Text PDF

A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case.

Sci Data

January 2025

Computer Science and Engineering Department, Universidad Carlos III de Madrid, Av. Universidad, 30, Leganés, 28911, Madrid, Spain.

This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación de Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department. There are few publicly available clinical texts in Spanish and having more is essential as a valuable resource to train and test information extraction systems. In total, 828 clinical notes in Spanish were employed and several experts participated in the annotation process by categorizing the annotated entities into medical semantic groups related to allergies.

View Article and Find Full Text PDF

The distinctive characteristics of an individual's T cell receptor repertoire are crucial in recognizing and responding to a diverse array of antigens, contributing to immune specificity and adaptability. The repertoire, famously vast due to a series of cellular mechanisms, can be quantified using repertoire sequencing. In this study, we sampled the repertoire of 85 women: ovarian cancer patients (OC) and healthy donors (HD), generating a dataset of T cell clones and their abundance.

View Article and Find Full Text PDF

Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland.

View Article and Find Full Text PDF

Ditylenchus destructor, commonly known as the potato rot nematode, is a significant plant-parasitic pathogen affecting over 120 plant species globally. Effective control measures for D. destructor are limited, underscoring the need a high-quality reference genome to understand its pathogenic mechanisms.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!