Heterogeneity is a frequent issue in population data analyses in medicine, biology, and the social sciences. A common approach for handling heterogeneity is to use a clustering algorithm to group similar samples, considering samples within the same group to be homogeneous. This approach is known as "subtyping" or "subgrouping." Methods for evaluating the validity of subtyping have yet to be fully established. In this study, we propose the cost of cluster mean-based prediction (CCMP) as a metric for evaluating the accuracy of predictions based on subtyping. By selecting the minimum CCMP among several candidate clustering results, the optimal subtype classification in terms of prediction accuracy can be determined. The computational implementation of the CCMP is validated with numerical experiments. We also examine some properties of subtype classification selected by CCMP.

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.9656DOI Listing

Publication Analysis

Top Keywords

cluster mean-based
8
mean-based prediction
8
subtype classification
8
evaluating predictive
4
predictive performance
4
performance subtyping
4
subtyping a criterion
4
a criterion cluster
4
prediction heterogeneity
4
heterogeneity frequent
4

Similar Publications

Accuracy of a deep learning-based algorithm for the detection of thoracic aortic calcifications in chest computed tomography and cardiovascular surgery planning.

Eur J Cardiothorac Surg

June 2024

Department of Diagnostic and Interventional Radiology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.

Objectives: To assess the accuracy of a deep learning-based algorithm for fully automated detection of thoracic aortic calcifications in chest computed tomography (CT) with a focus on the aortic clamping zone.

Methods: We retrospectively included 100 chest CT scans from 91 patients who were examined on second- or third-generation dual-source scanners. Subsamples comprised 47 scans with an electrocardiogram-gated aortic angiography and 53 unenhanced scans.

View Article and Find Full Text PDF

The preliminary classification of biological class data is of great importance for bioinformatics. One can quickly classify object data by comparing their existing features with known traits. k-nearest neighbor algorithm is easy to apply in this field, but its drawbacks make it less meaningful to improve the efficiency of the algorithm by simply changing the distance model, so this study uses a local mean-based k-nearest neighbor classifier and compares the accuracy of the predicted classification of six different distance models used.

View Article and Find Full Text PDF

Few-shot learning (FSL) aims to recognize novel classes with few examples. Pre-training based methods effectively tackle the problem by pre-training a feature extractor and then fine-tuning it through the nearest centroid based meta-learning. However, results show that the fine-tuning step makes marginal improvements.

View Article and Find Full Text PDF

Heterogeneity is a frequent issue in population data analyses in medicine, biology, and the social sciences. A common approach for handling heterogeneity is to use a clustering algorithm to group similar samples, considering samples within the same group to be homogeneous. This approach is known as "subtyping" or "subgrouping.

View Article and Find Full Text PDF

Epilepsy is marked by seizures stemming from abnormal electrical activity in the brain, causing involuntary movement or behavior. Many scientists have been working hard to explore the cause of epilepsy and seek the prevention and treatment. In the field of machine learning, epileptic diagnosis based on EEG signal has been a very hot research topic; many methods have been proposed, and considerable progress has been achieved.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!