Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward's minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward's and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4807765PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152333PLOS

Publication Analysis

Top Keywords

number clusters
12
cross-clustering partial
8
partial clustering
8
clustering algorithm
8
clustering methods
8
clustering
5
algorithm automatic
4
automatic estimation
4
number
4
estimation number
4

Similar Publications

Delocalized multicenter bonds play a crucial role in clusters with a planar hypercoordinate center(s), making it difficult for highly electronegative elements, especially halogen atoms, to achieve the planar hypercoordinate arrangement. Herein, we introduce a star-like cluster Br6Li5-, whose global minimum contains a planar pentacoordinate bromine (ppBr). In this cluster, the central ppBr atom coordinates with five alkali metal Li atoms, which in turn bridge an equal number of electronegative Br atoms in the periphery, leading to the formation of the binary cluster Br6Li5-.

View Article and Find Full Text PDF

Detection and quantification of disease-related biomarkers in wastewater samples, denominated Wastewater-based Surveillance (WBS), has proven a valuable strategy for studying the prevalence of infectious diseases within populations in a time- and resource-efficient manner, as wastewater samples are representative of all cases within the catchment area, whether they are clinically reported or not. However, analysis and interpretation of WBS datasets for decision-making during public health emergencies, such as the COVID-19 pandemic, remains an area of opportunity. In this article, a database obtained from wastewater sampling at wastewater treatment plants (WWTPs) and university campuses in Monterrey and Mexico City between 2021 and 2022 was used to train simple clustering- and regression-based risk assessment models to allow for informed prevention and control measures in high-affluence facilities, even if working with low-dimensionality datasets and a limited number of observations.

View Article and Find Full Text PDF

Despite extensive experience with influenza surveillance in humans in Senegal, there is limited knowledge about the actual situation and genetic diversity of avian influenza viruses (AIVs) circulating in the country, hindering control measures and pandemic risk assessment. Therefore, as part of the "One Health" approach to influenza surveillance, we conducted active AIV surveillance in two live bird markets (LBMs) in Dakar to better understand the dynamics and diversity of influenza viruses in Senegal, obtain genetic profiles of circulating AIVs, and assess the risk of emergence of novel strains and their transmission to humans. Cloacal swabs from poultry and environmental samples collected weekly from the two LBMs were screened by RT-qPCR for H5, H7, and H9 AIVs.

View Article and Find Full Text PDF

Morphometric Investigation of a Species Complex in Section Series (Leguminosae, Caesalpinioideae).

Plants (Basel)

January 2025

Departamento de Ciências Biológicas, Universidade Estadual de Feira de Santana, Av. Transnordestina s.n., Feira de Santana 44036-900, Bahia, Brazil.

series was created by Barneby in 1991, embracing species diagnosed by their small subshrubby habit and the presence of gland-tipped setae and trimerous flowers. Most species are endemic to Northeastern Brazil, and some possess characters deemed diagnostic which nonetheless overlap, making species identification difficult. Our study aimed to test species circumscriptions and sets of characters that could be applied to unequivocally distinguish the species.

View Article and Find Full Text PDF

Human Immunodeficiency Virus (HIV) proviral reservoirs are cells that harbor integrated HIV proviral DNA within their nuclear genomes. These cells form a heterogeneous group, represented by peripheral blood mononuclear cells (PBMCs), tissue-resident lymphoid and monocytic cells, and glial cells of the central nervous system. The importance of studying the properties of proviral reservoirs is connected with the inaccessibility of integrated HIV proviral DNA for modern anti-retroviral therapies (ARTs) that block virus reproduction.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!