Clustering with t-SNE, provably.

SIAM J Math Data Sci

Department of Mathematics, Yale University, New Haven, CT 06511, USA.

Published: May 2019

t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the 'early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter and step size . Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7561036PMC
http://dx.doi.org/10.1137/18m1216134DOI Listing

Publication Analysis

Top Keywords

van der
12
der maaten
12
proposed van
8
maaten hinton
8
hinton 2008
8
prove t-sne
8
clustering t-sne
4
t-sne provably
4
provably t-distributed
4
t-distributed stochastic
4

Similar Publications

Article Synopsis
  • The study investigates the use of intravascular lithotripsy (IVL) in treating heavily calcified chronic total occlusions (CTOs), noting that calcification leads to worse patient outcomes.
  • It analyzes data from 404 patients, finding that procedural success rates and safety outcomes were similar for both CTO and non-CTO patients.
  • The conclusion emphasizes that IVL is effective and safe for managing heavily calcified lesions, supporting its use in clinical practice.
View Article and Find Full Text PDF
Article Synopsis
  • Multi-b-value diffusion-weighted MRI techniques can measure brain tissue properties but face challenges due to SNR and the selection of b-values for accurate data gathering.
  • This study uses a genetic algorithm to determine the most effective b-values for estimating interstitial fluid in the brain, comparing its performance to other sampling methods.
  • Results showed that the optimized b-value scheme significantly reduced the root mean square error (RMSE), improving the accuracy of the diffusion component estimation related to interstitial fluid.
View Article and Find Full Text PDF

Objectives: A minimally invasive lobectomy (MIL) is the standard treatment for stage I non-small cell lung cancer (NSCLC) in medically operable patients. Stereotactic ablative radiotherapy (SABR) is recommended for inoperable patients and has been proposed as a potential alternative for operable patients as well. Here, we present the results of a feasibility study in preparation for a nationwide retrospective cohort study, comparing outcomes between both treatment modalities.

View Article and Find Full Text PDF

Diagnostic accuracy of Ara h 2 for detecting peanut allergy in children.

Clin Exp Allergy

August 2021

Department of Dermatology/Allergology, University Medical Center Utrecht, University of Utrecht, Utrecht, The Netherlands.

Article Synopsis
  • The study evaluated the effectiveness of a specific IgE test for diagnosing peanut allergies in children, aiming to reduce reliance on more invasive food challenge tests.
  • It involved 150 children aged 3.5 to 18 in the Netherlands, comparing results from the IgE test with actual peanut ingestion to determine allergy status.
  • The findings showed high diagnostic accuracy for the IgE test, identifying patients as peanut-tolerant or allergic, and potentially saving healthcare costs by using this method instead of national guidelines.
View Article and Find Full Text PDF

Background: Neonates and infants requiring anaesthesia are at risk of physiological instability and complications, but triggers for peri-anaesthetic interventions and associations with subsequent outcome are unknown.

Methods: This prospective, observational study recruited patients up to 60 weeks' postmenstrual age undergoing anaesthesia for surgical or diagnostic procedures from 165 centres in 31 European countries between March 2016 and January 2017. The primary aim was to identify thresholds of pre-determined physiological variables that triggered a medical intervention.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!