Objectives:  Patient data are fragmented across multiple repositories, yielding suboptimal and costly care. Record linkage algorithms are widely accepted solutions for improving completeness of patient records. However, studies often fail to fully describe their linkage techniques. Further, while many frameworks evaluate record linkage methods, few focus on producing gold standard datasets. This highlights a need to assess these frameworks and their real-world performance. We use real-world datasets and expand upon previous frameworks to evaluate a consistent approach to the manual review of gold standard datasets and measure its impact on algorithm performance.

Methods:  We applied the framework, which includes elements for data description, reviewer training and adjudication, and software and reviewer descriptions, to four datasets. Record pairs were formed and between 15,000 and 16,500 records were randomly sampled from these pairs. After training, two reviewers determined match status for each record pair. If reviewers disagreed, a third reviewer was used for final adjudication.

Results:  Between the four datasets, the percent discordant rate ranged from 1.8 to 13.6%. While reviewers' discordance rate typically ranged between 1 and 5%, one exhibited a 59% discordance rate, showing the importance of the third reviewer. The original analysis was compared with three sensitivity analyses. The original analysis most often exhibited the highest predictive values compared with the sensitivity analyses.

Conclusion:  Reviewers vary in their assessment of a gold standard, which can lead to variances in estimates for matching performance. Our analysis demonstrates how a multireviewer process can be applied to create gold standards, identify reviewer discrepancies, and evaluate algorithm performance.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11290950PMC
http://dx.doi.org/10.1055/a-2291-1391DOI Listing

Publication Analysis

Top Keywords

record linkage
12
gold standard
12
algorithm performance
8
performance real-world
8
real-world datasets
8
frameworks evaluate
8
standard datasets
8
third reviewer
8
discordance rate
8
original analysis
8

Similar Publications

Linking and GenBank to the National Clinical Cohort Collaborative.

Learn Health Syst

January 2025

Department of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USA.

Objective: This project demonstrates the feasibility of connecting medical imaging data and features, SARS-CoV-2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COVID-19-related morbidities. The N3C curated a rich collection of aggregated and de-identified electronic health records (EHR) data of over 18 million patients, including 7.5 million COVID-positive patients, seen at hospitals across the United States.

View Article and Find Full Text PDF

Background: We aim to study the potential association between tattoo ink exposure and development of certain types of cancers in the recently established Danish Twin Tattoo Cohort. Tattoo ink is known to transfer from skin to blood and accumulate in regional lymph nodes. We are concerned that tattoo ink induces inflammation at the deposit site, leading to chronic inflammation and increasing risk of abnormal cell proliferation, especially skin cancer and lymphoma.

View Article and Find Full Text PDF

Background: Tuberculosis (TB) is a leading cause of death worldwide with over 90% of reported cases occurring in low- and middle-income countries (LMICs). Pre-treatment loss to follow-up (PTLFU) is a key contributor to TB mortality and infection transmission.

Objectives: We performed a scoping review to map available evidence on interventions to reduce PTLFU in adults with pulmonary TB, identify gaps in existing knowledge, and develop a conceptual framework to guide intervention implementation.

View Article and Find Full Text PDF

Alcohol consumption, drinking patterns and cause-specific mortality in an Australian cohort of 181,607 participants aged 45 years and over.

Public Health

December 2024

The Daffodil Centre, The University of Sydney, a Joint Venture with Cancer Council NSW, Postal Address: PO Box 572, KINGS CROSS, NSW, 1340, Australia.

Objectives: Despite relatively high alcohol consumption in Australia, local evidence regarding drinking and cause-specific mortality is limited. We aimed to quantify the risk of alcohol-related causes of death and to calculate contemporary estimates of absolute risk and population attributable fractions for deaths caused by alcohol consumption in Australia.

Study Design: Prospective cohort study.

View Article and Find Full Text PDF

Piloting a minimum data set for older people living in care homes in England: a developmental study.

Age Ageing

January 2025

Centre for Research in Public Health and Community Care (CRIPACC), University of Hertfordshire, College Lane, Hatfield, UK.

Background: We developed a prototype minimum data set (MDS) for English care homes, assessing feasibility of extracting data directly from digital care records (DCRs) with linkage to health and social care data.

Methods: Through stakeholder development workshops, literature reviews, surveys and public consultation, we developed an aspirational MDS. We identified ways to extract this from existing sources, including DCRs and routine health and social care datasets.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!