Background: Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases.
Methods: We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage.
Results: The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality.
Conclusion: We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7809773 | PMC |
http://dx.doi.org/10.1186/s12967-020-02678-1 | DOI Listing |
BMJ Health Care Inform
January 2025
College of Medicine and Veterinary Medic, The University of Edinburgh Usher Institute of Population Health Sciences and Informatics, Edinburgh, UK.
Aim: We aimed to identify enablers and barriers of using primary care routine data for healthcare research, to formulate recommendations for improving efficiency in knowledge discovery.
Background: Data recorded routinely in primary care can be used for estimating the impact of interventions provided within routine care for all people who are clinically eligible. Despite official promotion of 'efficient trial designs', anecdotally researchers in the Asthma UK Centre for Applied Research (AUKCAR) have encountered multiple barriers to accessing and using routine data.
PLoS One
January 2025
Department of Computer Science and Engineering at Hanyang University ERICA, Ansan-si, Gyeonggi-do, South Korea.
Privacy-preserving record linkage (PPRL) technology, crucial for linking records across datasets while maintaining privacy, is susceptible to graph-based re-identification attacks. These attacks compromise privacy and pose significant risks, such as identity theft and financial fraud. This study proposes a zero-relationship encoding scheme that minimizes the linkage between source and encoded records to enhance PPRL systems' resistance to re-identification attacks.
View Article and Find Full Text PDFBackground: The Enhanced Dementia Surveillance Initiative (EDSI), led by the Public Health Agency of Canada (PHAC), supports the implementation of Canada's first national dementia strategy. To improve the national monitoring of dementia and its health impacts, the EDSI projects focused on priority data gaps: dementia by cause, progression stages and impacts; socio-demographic characteristics, risk and protective factors; and caregivers.
Method: PHAC collaborated on 15 projects with multiple stakeholders (universities/research institutions, health organizations, and federal/provincial government departments).
Alzheimers Dement
December 2024
SAIHST, Sungkyunkwan University, Seoul, Korea, Republic of (South).
Background: Brain diseases complexity have necessitated advanced research platforms for better understanding, treatment, and prevention strategies. However, existing brain disease registries face limitations such as incomplete variable sets, lack of standardization, insufficient linkage to external databases, absence of integrated platforms for comprehensive data collection, and lack of continuity. To address these challenges, the Korea National Institute of Health initiated the Brain disease Research Infrastructure for Data Gathering and Exploration (BRIDGE), a national prospective platform designed to overcome the shortcomings of current registries.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
National Alzheimer's Coordinating Center, University of Washington, Seattle, WA, USA.
Background: Persons with Alzheimer's disease and related dementias (ADRD) have been disproportionally impacted by the COVID-19 pandemic, showing a significantly increased risk of infection and severe illness, including neurocognitive consequences. The National Alzheimer's Coordinating Center collects rich longitudinal, standardized neurocognitive data from populations at high risk of COVID-19 complications, including older adults who were cognitively normal prior to infection and those who had pre-existing ADRD. These data, in combination with Electronic Health Records (EHR) clinic data, will be critical for understanding the complex pathophysiology and cognitive symptoms of COVID-19 and the development of future therapeutics.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!