Having a system to stratify individuals according to risk is key to clinical disease prevention. This allows individuals identified at different risk tiers to benefit from further investigation and intervention. But the same risk score estimated for two different persons does not mean they need the same further investigation or represent the similarity health condition between two persons. Meanwhile, users still do not know a prior what most of the risk tiers are, and how many tiers should be found in risk stratification. In this paper, the proposed pairwise and size constrained Kmeans (PSCKmeans) method simultaneously integrates the limited supervised information and the size constraints to screen the high-risk population based on similarity measurement, and gets a feasible and balanced stratification solution to avoid cluster with few points. Results on China Health and Nutrition Survey public dataset and follow-up dataset show that the proposed PSCKmeans method can naturally grade the risk of diabetes into four tiers, and achieve 73.8%, 85.1%, and 0.95% sensitivity, specificity, and ratio of minimum to expected on testing data. The proposed method compares favorably with eight previous semisupervised clustering methods; it demonstrates that semisupervised clustering by unifying multiple forms of constraints can guide a good partition that is more relevant for the domain and find new categories through prior knowledge. Finally, this risk stratification model can provide a tool for risk stratification of clinical disease and be used for further intervention for people with similar health condition.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/JBHI.2016.2633403 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!