A simple heuristic for blindfolded record linkage.

J Am Med Inform Assoc

Center for Clinical Informatics, Stanford University, Stanford, California 94305, USA.

Published: June 2012

Objectives: To address the challenge of balancing privacy with the need to create cross-site research registry records on individual patients, while matching the data for a given patient as he or she moves between participating sites. To evaluate the strategy of generating anonymous identifiers based on real identifiers in such a way that the chances of a shared patient being accurately identified were maximized, and the chances of incorrectly joining two records belonging to different people were minimized.

Methods: Our hypothesis was that most variation in names occurs after the first two letters, and that date of birth is highly reliable, so a single match variable consisting of a hashed string built from the first two letters of the patient's first and last names plus their date of birth would have the desired characteristics. We compared and contrasted the match algorithm characteristics (rate of false positive v. rate of false negative) for our chosen variable against both Social Security Numbers and full names.

Results: In a data set of 19 000 records, a derived match variable consisting of a 2-character prefix from both first and last names combined with date of birth has a 97% sensitivity; by contrast, an anonymized identifier based on the patient's full names and date of birth has a sensitivity of only 87% and SSN has sensitivity 86%.

Conclusion: The approach we describe is most useful in situations where privacy policies preclude the full exchange of the identifiers required by more sophisticated and sensitive linkage algorithms. For data sets of sufficiently high quality this effective approach, while producing a lower rate of matching than more complex algorithms, has the merit of being easy to explain to institutional review boards, adheres to the minimum necessary rule of the HIPAA privacy rule, and is faster and less cumbersome to implement than a full probabilistic linkage.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392854PMC
http://dx.doi.org/10.1136/amiajnl-2011-000329DOI Listing

Publication Analysis

Top Keywords

match variable
8
variable consisting
8
names birth
8
rate false
8
simple heuristic
4
heuristic blindfolded
4
blindfolded record
4
record linkage
4
linkage objectives
4
objectives address
4

Similar Publications

Background: Reporting serious adverse events (SAEs) is crucial to reduce or avoid toxicities that can lead to major consequences for patient's health due to treatments tested in clinical trials. Its exhaustiveness is often inadequate, and we observe discrepancies between data published by pharmacovigilance organizations and clinical databases.

Objectives: While the process of reconciliation aims at reducing these differences, it remains a very time-consuming and imprecise task.

View Article and Find Full Text PDF

Background: Aneurysmal subarachnoid hemorrhage (aSAH) causes systemic changes that contribute to delayed cerebral ischemia (DCI) and morbidity. Circulating metabolites reflecting underlying pathophysiological mechanisms warrant investigation as biomarker candidates.

Methods: Blood samples, prospectively collected within 24 hours (T1) of admission and 7-days (T2) post ictus, from patients with acute aSAH from two tertiary care centers were retrospectively analyzed.

View Article and Find Full Text PDF

Background: Existing functional connectivity studies of psychosis use population-averaged functional network maps, despite highly variable topographies of these networks across the brain surface. We aimed to define the functional network areas and topographies in the general population and the changes associated with psychotic experiences (PEs) and disorders.

Methods: Maps of 8 functional networks were generated using an individual-specific template-matching procedure for each participant from the Human Connectome Project Young Adult cohort ( = 1003) and from a matched case cohort (schizophrenia [SCZ],  = 27; bipolar disorder,  = 35) scanned identically with the same Connectom scanner.

View Article and Find Full Text PDF

Rhesus macaques (RMs) are vital models for studying human disease, and are invaluable to pre-clinical pipelines for vaccine discovery and testing. Particularly in this regard, they are often used to study infection and vaccine-associated broadly neutralizing antibody responses. This has resulted in an increasing demand for improved genetic resources for the immunoglobulin (IG) loci, which harbor antibody-encoding genes.

View Article and Find Full Text PDF

Levayer and colleagues assessed integrity issues in randomized controlled trials (RCTs) in four spine journals using baseline -values from categorical variables, concluding that there was no evidence of 'systemic fraudulent behaviour'. We used their published dataset to assess the accuracy of reported -values and whether observed and expected distributions of frequency counts and -values were consistent. In 51 out of 929 (5.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!