Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis.

BMC Nephrol

Outcomes Research Methods & Analytics, US Health Economics & Outcomes Research, Novartis Pharmaceuticals Corporation, One Health Plaza, East Hanover, NJ, 07936-1080, USA.

Published: March 2016

AI Article Synopsis

  • Cluster analysis (CA) was used to explore cost patterns in patients with end-stage renal disease starting hemodialysis, revealing distinct cost clusters based on healthcare expenditures.
  • A study analyzed claims data from over 18,000 patients, applying K-means and hierarchical CA methods to categorize costs into four clusters based on pre- and post-hemodialysis spending.
  • The findings indicated varied patterns of healthcare costs, showing significant increases, decreases, or stability across different patient groups, with implications for managing healthcare resources for ESRD patients.

Article Abstract

Background: Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and "clusters" found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods.

Methods: A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster.

Results: A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward's methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores on comorbidity index scores from the pre-and post-HD periods, while increasing costs were associated with more sharply increasing comorbidity scores.

Conclusions: The K-means CA method appeared to be the most appropriate in healthcare claims data with highly skewed cost information when taking into account both change of cost patterns and sample size in the smallest cluster.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776444PMC
http://dx.doi.org/10.1186/s12882-016-0238-2DOI Listing

Publication Analysis

Top Keywords

cluster average
16
cluster
12
healthcare claims
12
k-means method
12
increasing costs
12
cluster analysis
8
claims data
8
end-stage renal
8
renal disease
8
initiated hemodialysis
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!