Objective: Valid measurement of outcomes such as disease prevalence using health care utilization data is fundamental to the implementation of a "learning health system." Definitions of such outcomes can be complex, based on multiple diagnostic codes. The literature on validating such data demonstrates a lack of awareness of the need for a stratified sampling design and corresponding statistical methods. We propose a method for validating the measurement of diagnostic groups that have: (1) different prevalences of diagnostic codes within the group; and (2) low prevalence.
Methods: We describe an estimation method whereby: (1) low-prevalence diagnostic codes are oversampled, and the positive predictive value (PPV) of the diagnostic group is estimated as a weighted average of the PPV of each diagnostic code; and (2) claims that fall within a low-prevalence diagnostic group are oversampled relative to claims that are not, and bias-adjusted estimators of sensitivity and specificity are generated.
Application: We illustrate our proposed method using an example from population health surveillance in which diagnostic groups are applied to physician claims to identify cases of acute respiratory illness.
Conclusions: Failure to account for the prevalence of each diagnostic code within a diagnostic group leads to the underestimation of the PPV, because low-prevalence diagnostic codes are more likely to be false positives. Failure to adjust for oversampling of claims that fall within the low-prevalence diagnostic group relative to those that do not leads to the overestimation of sensitivity and underestimation of specificity.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5510703 | PMC |
http://dx.doi.org/10.1097/MLR.0000000000000324 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!