Objective: Our objective was to evaluate tokens commonly used by clinical research consortia to aggregate clinical data across institutions.

Methods: This study compares tokens alone and token-based matching algorithms against manual annotation for 20,002 record pairs extracted from the University of Texas Houston's clinical data warehouse (CDW) in terms of entity resolution.

Results: The highest precision achieved was 99.9% with a token derived from the first name, last name, gender, and date-of-birth. The highest recall achieved was 95.5% with an algorithm involving tokens that reflected combinations of first name, last name, gender, date-of-birth, and social security number.

Discussion: To protect the privacy of patient data, information must be removed from a health care dataset to obscure the identity of individuals from which that data were derived. However, once identifying information is removed, records can no longer be linked to the same entity to enable analyses. Tokens are a mechanism to convert patient identifying information into Health Insurance Portability and Accountability Act-compliant deidentified elements that can be used to link clinical records, while preserving patient privacy.

Conclusion: Depending on the availability and accuracy of the underlying data, tokens are able to resolve and link entities at a high level of precision and recall for real-world data derived from a CDW.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9474266PMC
http://dx.doi.org/10.1055/a-1910-4154DOI Listing

Publication Analysis

Top Keywords

clinical data
8
gender date-of-birth
8
data derived
8
tokens
6
data
6
real-world matching
4
matching performance
4
performance deidentified
4
deidentified record-linking
4
record-linking tokens
4

Similar Publications

Factors Associated With Semaglutide Initiation Among Adults With Obesity.

JAMA Netw Open

January 2025

Department of Global Health, School of Public Health, Boston University, Boston, Massachusetts.

Importance: Semaglutide, a novel glucagon-like peptide-1 (GLP-1) receptor agonist medication, was approved for weight management in individuals with obesity in June 2021. There is limited evidence on factors associated with uptake among individuals in this subgroup without diabetes.

Objective: To explore factors associated with semaglutide initiation among a population of commercially insured individuals with obesity but no diagnosed diabetes.

View Article and Find Full Text PDF

Use of Albumin-Adjusted Calcium Measurements in Clinical Practice.

JAMA Netw Open

January 2025

Division of Endocrinology and Metabolism, Department of Medicine, University of Calgary, Calgary, Alberta, Canada.

Importance: Using albumin-adjusted calcium is commonly recommended for for measuring calcium, but with little empirical evidence to support the practice.

Objective: To assess the correlation between total calcium measurements (with or without adjustment) vs the ionized calcium level as a reference standard.

Design, Setting, And Participants: This was a population-based cross-sectional study in the province of Alberta, Canada, including adults tested for serum total calcium and ionized calcium simultaneously between January 1, 2013, and October 31, 2019.

View Article and Find Full Text PDF

Importance: Timely access to care is a key metric for health care systems and is particularly important in conditions that acutely worsen with delays in care, including surgical emergencies. However, the association between travel time to emergency care and risk for complex presentation is poorly understood.

Objective: To evaluate the impact of travel time on disease complexity at presentation among people with emergency general surgery conditions and to evaluate whether travel time was associated with clinical outcomes and measures of increased health resource utilization.

View Article and Find Full Text PDF

External Validation of a 5-Factor Risk Model for Breast Cancer-Related Lymphedema.

JAMA Netw Open

January 2025

Institute of Medical Science, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

Importance: Secondary lymphedema is a common, harmful side effect of breast cancer treatment. Robust risk models that are externally validated are needed to facilitate clinical translation. A published risk model used 5 accessible clinical factors to predict the development of breast cancer-related lymphedema; this model included a patient's mammographic breast density as a novel predictive factor.

View Article and Find Full Text PDF

Age at Menopause and Development of Type 2 Diabetes in Korea.

JAMA Netw Open

January 2025

Department of Family Medicine, Korea University Guro Hospital, Korea University College of Medicine, Seoul, Republic of Korea.

Importance: There is limited evidence regarding the association between age at menopause and incident type 2 diabetes (T2D).

Objective: To investigate whether age at menopause and premature menopause are associated with T2D incidence in postmenopausal Korean women.

Design, Setting, And Participants: This population-based cohort study was conducted among a nationally representative sample from the Korean National Health Insurance Service database of 1 125 378 postmenopausal women without T2D who enrolled in 2009.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!