AI Article Synopsis

  • Techniques for computing statistics on distributed datasets need secure deduplication to ensure accuracy by addressing duplicate records without compromising privacy.
  • A secure protocol for deduplication was developed and tested across three microbiology labs in Norway, demonstrating robustness against semi-honest adversaries while maintaining privacy.
  • The results showed that the protocol is efficient, deduplicating over a million records in 45 seconds, and is more scalable compared to previous methods, making it suitable for practical applications.

Article Abstract

Background: Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step.

Methods: We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network.

Results: The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N - 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem.

Conclusions: The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209873PMC
http://dx.doi.org/10.1186/s12911-016-0389-xDOI Listing

Publication Analysis

Top Keywords

data custodians
24
deduplication horizontally
8
horizontally partitioned
8
records distributed
8
distributed dataset
8
protocol
8
security analysis
8
protocol efficient
8
efficient scalable
8
data
7

Similar Publications

Background: Automated recognition and redaction of personal identifiers in free text can enable organisations to share data while protecting privacy. This is important in the context of pharmacovigilance since relevant detailed information on the clinical course of events, differential diagnosis, and patient-reported reflections may often only be conveyed in narrative form. The aim of this study is to develop and evaluate a method for automated redaction of person names in English narrative text on adverse event reports.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates how reallocating time among physical activity, sedentary behavior, and sleep affects obesity indicators like BMI and waist circumference across various age groups.
  • Researchers analyzed data from 9,818 participants using isotemporal substitution models to understand the implications of these behavior changes.
  • Results indicated that even small shifts of 10-30 minutes can significantly impact obesity, with reallocating moderate-to-vigorous physical activity (MVPA) to lighter activities or sedentary behavior having particularly detrimental effects.
View Article and Find Full Text PDF

Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations.

Lancet Digit Health

January 2025

University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK; Centre for Patient Reported Outcomes Research, School of Health Sciences, College of Medical and Dental Sciences, Birmingham, UK; University of Birmingham, Birmingham, UK. Electronic address:

Without careful dissection of the ways in which biases can be encoded into artificial intelligence (AI) health technologies, there is a risk of perpetuating existing health inequalities at scale. One major source of bias is the data that underpins such technologies. The STANDING Together recommendations aim to encourage transparency regarding limitations of health datasets and proactive evaluation of their effect across population groups.

View Article and Find Full Text PDF

Background: Feeding practices during infancy have a significant impact on a child's cognitive development and long-term health outcomes. Dietary diversity guidelines from the WHO and UNICEF recommend a diverse range of foods for children aged below 24 months for their optimal growth and development. However, in sub-Saharan Africa (SSA), little is known about the extent to which dietary diversity behaviour in children aged 6 to 24 months aligns with the recommendations and the factors associated with the differentials in dietary behaviour.

View Article and Find Full Text PDF

Background: Colorectal cancer (CRC) poses a significant public health challenge in Canada, with the Atlantic provinces bearing a particularly high burden. The implementation of population-based colon screening programs is aimed to address this concern. However, limited research exists on the effect of these programs especially in Canada.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!