Background: Big data useful for epidemiological research can be obtained by integrating data corresponding to individuals between databases managed by different institutions. Privacy information must be protected while performing efficient, high-level data matching.

Objective: Privacy-preserving distributed data integration (PDDI) enables data matching between multiple databases without moving privacy information; however, its actual implementation requires matching security, accuracy, and performance. Moreover, identifying the optimal data item in the absence of a unique matching key is necessary. We aimed to conduct a basic matching experiment using a model to assess the accuracy of cancer screening.

Methods: To experiment with actual data, we created a data set mimicking the cancer screening and registration data in Japan and conducted a matching experiment using a PDDI system between geographically distant institutions. Errors similar to those found empirically in data sets recorded in Japanese were artificially introduced into the data set. The matching-key error rate of the data common to both data sets was set sufficiently higher than expected in the actual database: 85.0% and 59.0% for the data simulating colorectal and breast cancers, respectively. Various combinations of name, gender, date of birth, and address were used for the matching key. To evaluate the matching accuracy, the matching sensitivity and specificity were calculated based on the number of cancer-screening data points, and the effect of matching accuracy on the sensitivity and specificity of cancer screening was estimated based on the obtained values. To evaluate the performance, we measured central processing unit use, memory use, and network traffic.

Results: For combinations with a specificity ≥99% and high sensitivity, the date of birth and first name were used in the data simulating colorectal cancer, and the matching sensitivity and specificity were 55.00% and 99.85%, respectively. In the data simulating breast cancer, the date of birth and family name were used, and the matching sensitivity and specificity were 88.71% and 99.98%, respectively. Assuming the sensitivity and specificity of cancer screening at 90%, the apparent values decreased to 74.90% and 89.93%, respectively. A trial calculation was performed using a combination with the same data set and 100% specificity. When the matching sensitivity was 82.26%, the apparent screening sensitivity was maintained at 90%, and the screening specificity decreased to 89.89%. For 214 data points, the execution time was 82 minutes and 26 seconds without parallelization and 11 minutes and 38 seconds with parallelization; 19.33% of the calculation time was for the data-holding institutions. Memory use was 3.4 GB for the PDDI server and 2.7 GB for the data-holding institutions.

Conclusions: We demonstrated the rudimentary feasibility of introducing a PDDI system for cancer-screening accuracy assessment. We plan to conduct matching experiments based on actual data and compare them with the existing methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9840098PMC
http://dx.doi.org/10.2196/38922DOI Listing

Publication Analysis

Top Keywords

data
22
sensitivity specificity
20
cancer screening
16
matching sensitivity
16
matching
13
data integration
12
data set
12
data simulating
12
privacy-preserving distributed
8
accuracy assessment
8

Similar Publications

Outcomes With Radiation Therapy as Primary Treatment for Unresectable Cutaneous Head and Neck Squamous Cell Carcinoma.

Clin Oncol (R Coll Radiol)

December 2024

Radiation Oncology Network, Westmead Hospital, Westmead, NSW, Australia; Sydney Medical School, The University of Sydney, Camperdown, NSW 2006, Australia. Electronic address:

Aims: Unresectable cutaneous squamous cell cancer of the head and neck (HNcSCC) poses treatment challenges in elderly and comorbid patients. Radiation therapy (RT) is often employed for locoregional control. This study aimed to determine progression-free survival (PFS) and overall survival (OS) outcomes achieved with upfront RT in unresectable HNcSCC.

View Article and Find Full Text PDF

Objective: Discussions related to the importance of seeking specific consent for sensitive (e.g., pelvic, rectal) exams performed on anesthetized patients by medical students have been growing.

View Article and Find Full Text PDF

Who is coming in? Evaluation of physician performance within multi-physician emergency departments.

Am J Emerg Med

January 2025

Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, USA; Center for Outcomes Research and Evaluation, Yale University, New Haven, CT, USA.

Background: This study aimed to examine how physician performance metrics are affected by the speed of other attendings (co-attendings) concurrently staffing the ED.

Methods: A retrospective study was conducted using patient data from two EDs between January-2018 and February-2020. Machine learning was used to predict patient length of stay (LOS) conditional on being assigned a physician of average speed, using patient- and departmental-level variables.

View Article and Find Full Text PDF

National early warning score 2 plus non-invasive capnography and perfusion index to estimate poor outcomes in emergency departments.

Am J Emerg Med

January 2025

Faculty of Medicine, Universidad de Valladolid, Valladolid, Spain; Emergency Department, Hospital Clínico Universitario, Gerencia Regional de Salud de Castilla y León, Valladolid, Spain.

Background: The study of the inclusion of new variables in already existing early warning scores is a growing field. The aim of this work was to determine how capnometry measurements, in the form of end-tidal CO2 (ETCO2) and the perfusion index (PI), could improve the National Early Warning Score (NEWS2).

Methods: A secondary, prospective, multicenter, cohort study was undertaken in adult patients with unselected acute diseases who needed continuous monitoring in the emergency department (ED), involving two tertiary hospitals in Spain from October 1, 2022, to June 30, 2023.

View Article and Find Full Text PDF

Mild cognitive impairment (MCI) is a significant predictor of the early progression of Alzheimer's disease, and it can be used as an important indicator of disease progression. However, many existing methods focus mainly on the image itself when processing brain imaging data, ignoring other non-imaging data (e.g.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!