A General Iterative Clustering Algorithm.

Stat Anal Data Min

Department of Psychiatry, NYU Langone School of Medicine, New York, NY, USA.

Published: August 2022

The quality of a cluster analysis of unlabeled units depends on the quality of the between units dissimilarity measures. Data dependent dissimilarity is more objective than data independent geometric measures such as Euclidean distance. As suggested by Breiman, many data driven approaches are based on decision tree ensembles, such as a random forest (RF), that produce a proximity matrix that can easily be transformed into a dissimilarity matrix. A RF can be obtained using labels that distinguish units with real data from units with synthetic data. The resulting dissimilarity matrix is input to a clustering program and units are assigned labels corresponding to cluster membership. We introduce a General Iterative Cluster (GIC) algorithm that improves the proximity matrix and clusters of the base RF. The cluster labels are used to grow a new RF yielding an updated proximity matrix which is entered into the clustering program. The process is repeated until convergence. The same procedure can be used with many base procedures such as the Extremely Randomized Tree ensemble. We evaluate the performance of the GIC algorithm using benchmark and simulated data sets. The properties measured by the Silhouette Score are substantially superior to the base clustering algorithm. The GIC package has been released in R: https://cran.r-project.org/web/packages/GIC/index.html.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9438941PMC
http://dx.doi.org/10.1002/sam.11573DOI Listing

Publication Analysis

Top Keywords

proximity matrix
12
general iterative
8
clustering algorithm
8
dissimilarity matrix
8
clustering program
8
gic algorithm
8
data
6
units
5
matrix
5
clustering
4

Similar Publications

The ubiquitin (Ub) ligase E6AP, which is encoded by the UBE3A gene, has been associated with several human diseases including cervical cancer and Angelman syndrome, a neurodevelopmental disorder. Yet, our knowledge about disease-relevant substrates of E6AP is still limited. The formation of a thioester complex between Ub and the catalytic Cys residue of E6AP represents an essential intermediate step in E6AP-mediated ubiquitination.

View Article and Find Full Text PDF

Harnessing LRET in a rationally designed "sandwich" fluorescent probe for selective ClO sensing.

Spectrochim Acta A Mol Biomol Spectrosc

January 2025

Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310022, PR China; School of Molecular Medicine, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, PR China.

Article Synopsis
  • Upconversion nanoparticles (UCNPs) are advanced light-emitting materials that use near-infrared light for sensing, helping to avoid issues caused by natural fluorescence in biological samples.
  • Traditional UCNP designs have limitations in accurately locating luminescent doped ions within their structure, leading to background noise and inefficient light emission.
  • The new core-middle-shell UCNPs-IR820 design improves luminescence detection by incorporating a "sandwich" structure that enhances energy transfer, allowing for effective signaling changes in response to specific analytes like ClO.
View Article and Find Full Text PDF

A novel patient group with chronic pulmonary fibrosis is emerging post COVID-19. To identify patients at risk of developing post-COVID-19 lung fibrosis, we here aimed to identify systemic proteins that overlap with fibrotic markers identified in patients with idiopathic pulmonary fibrosis (IPF) and may predict COVID-19-induced lung fibrosis. Ninety-two proteins were measured in plasma samples from hospitalized patients with moderate and severe COVID-19 in Sweden, before the introduction of the vaccination program, as well as from healthy individuals.

View Article and Find Full Text PDF

FGF and TGF-β Growth Factor Isoform Modulation of Human Gingival and Periodontal Ligament Fibroblast Wound Healing Phenotype.

Matrix Biol

January 2025

Department of Anatomy and Cell Biology, Dentistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, N6A 3K7, Canada; Dentistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, N6A 3K7, Canada. Electronic address:

Release of growth factors in the tissue microenvironment is a critical process in the repair and regeneration of periodontal tissues, regulating fibroblast behavior and phenotype. As a result of the complex architecture of the periodontium, distinct fibroblast populations in the periodontal ligament and gingival connective tissue exist in close proximity. Growth factor therapies for periodontal regeneration have gained traction, but quantification of their effects on multiple different fibroblast populations that are required for repair has been poorly investigated.

View Article and Find Full Text PDF

Tool use to crack open palm nuts has been observed extensively in some capuchin monkey species. However, for southern black-horned capuchin monkeys (Sapajus nigritus cucullatus), there is only one published record of stone tool use from the 1990s, from an urban park in Londrina, Brazil. In the present study, we returned to this urban park to systematically investigate the hammer-and-anvil sites used to crack nuts by this capuchin monkey population.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!