Mixed-membership unsupervised clustering is widely used to extract informative patterns from data in many application areas. For a shared data set, the stochasticity and unsupervised nature of clustering algorithms can cause difficulties in comparing clustering results produced by different algorithms, or even multiple runs of the same algorithm, as outcomes can differ owing to permutation of the cluster labels or genuine differences in clustering results. Here, with a focus on inference of individual genetic ancestry in population-genetic studies, we study the cost of misalignment of mixed-membership unsupervised clustering replicates under a theoretical model of cluster memberships. Using Dirichlet distributions to model membership coefficient vectors, we provide theoretical results quantifying the alignment cost as a function of the Dirichlet parameters and the Hamming permutation difference between replicates. For fixed Dirichlet parameters, the alignment cost is seen to increase with the Hamming distance between permutations. Data sets with low variance across individuals of membership coefficients for specific clusters generally produce high misalignment costs-so that a single optimal permutation has far lower cost than suboptimal permutations. Higher variability in data, as represented by greater variance of membership coefficients, generally results in alignment costs that are similar between the optimal permutation and suboptimal permutations. We demonstrate the application of the theoretical results to data simulated under the Dirichlet model, as well as to membership estimates from inference of human-genetic ancestry. The results can contribute to improving cluster alignment algorithms that seek to find optimal permutations of replicates.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10656040PMC
http://dx.doi.org/10.1080/10618600.2022.2127739DOI Listing

Publication Analysis

Top Keywords

alignment cost
12
mixed-membership unsupervised
12
unsupervised clustering
12
dirichlet model
8
dirichlet parameters
8
membership coefficients
8
optimal permutation
8
suboptimal permutations
8
clustering
6
dirichlet
5

Similar Publications

Objective: Wound management can be costly and challenging to the health services' scarce resources. Information regarding the number of wounds in a community care setting and their associated aetiology will provide nurses and nurse managers with an insight into the specific needs of these clients with wounds and highlight areas where care or services can be improved or further developed. This research aimed to establish the prevalence and aetiology of wounds, the current delivery of wound care, wound documentation and referral pathways in an Irish community care setting.

View Article and Find Full Text PDF

Severe mitral regurgitation (MR) following surgical repair of the mitral valve poses a significant clinical challenge. Patients who have undergone surgery are typically at high risk for a second operation. This report details the case of a 54-year-old male who underwent aortic valve replacement and mitral valve repair using a 34-ring, 14 years prior.

View Article and Find Full Text PDF

Unlabelled: School meals play a critical role in supporting students' biopsychosocial growth, development, learning, academic performance, and the establishment of healthy eating habits. In public institutions, food procurement is conducted through formal public procurement processes. However, emphasizing cost-effectiveness in bidding criteria, such as prioritizing the lowest product price, may inadvertently encourage the acquisition of foods high in critical nutrients.

View Article and Find Full Text PDF

Pultruded carbon fiber-reinforced composites are attractive to the wind energy industry due to the rapid production of highly aligned unidirectional composites with enhanced fiber volume fractions and increased specific strength and stiffness. However, high volume carbon fiber manufacturing remains cost-prohibitive. This study investigates the feasibility of a pultruded low-cost textile carbon fiber-reinforced epoxy composite as a promising material in spar cap production was undertaken based on mechanical response to four-point flexure loading.

View Article and Find Full Text PDF

Plastic waste (PW) presents a significant environmental challenge due to its persistent accumulation and harmful effects on ecosystems. According to the United Nations Environment Program (UNEP), global plastic production in 2024 is estimated to reach approximately 500 million tons. Without effective intervention, most of this plastic is expected to become waste, potentially resulting in billions of tons of accumulated PW by 2060.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!