The UK Biobank study contains several sources of diagnostic data, including hospital inpatient data and data on self-reported conditions for approximately 500,000 participants and primary-care data for approximately 177,000 participants (35%). Epidemiologic investigations require a primary disease definition, but whether to combine data sources to maximize statistical power or focus on only 1 source to ensure a consistent outcome is not clear. The consistency of disease definitions was investigated for venous thromboembolism (VTE) by evaluating overlap when defining cases from 3 sources: hospital inpatient data, primary-care reports, and self-reported questionnaires. VTE cases showed little overlap between data sources, with only 6% of reported events for persons with primary-care data being identified by all 3 sources (hospital, primary-care, and self-reports), while 71% appeared in only 1 source. Deep vein thrombosis-only events represented 68% of self-reported VTE cases and 36% of hospital-reported VTE cases, while pulmonary embolism-only events represented 20% of self-reported VTE cases and 50% of hospital-reported VTE cases. Additionally, different distributions of sociodemographic characteristics were observed; for example, patients in 46% of hospital-reported VTE cases were female, compared with 58% of self-reported VTE cases. These results illustrate how seemingly neutral decisions taken to improve data quality can affect the representativeness of a data set.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11074710PMC
http://dx.doi.org/10.1093/aje/kwad232DOI Listing

Publication Analysis

Top Keywords

vte cases
28
data sources
12
self-reported vte
12
hospital-reported vte
12
data
11
venous thromboembolism
8
hospital inpatient
8
inpatient data
8
primary-care data
8
vte
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!