Shape-aware stochastic neighbor embedding for robust data visualisations.

BMC Bioinformatics

Department of Mathematics, Stockholm University, Stockholm, Sweden.

Published: November 2022

Background: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method.

Results: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study.

Conclusions: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9660178PMC
http://dx.doi.org/10.1186/s12859-022-05028-8DOI Listing

Publication Analysis

Top Keywords

stochastic neighbor
12
neighbor embedding
12
data
9
shape-aware stochastic
8
high-dimensional data
8
single-cell transcriptomics
8
transcriptomics data
8
proposed method
8
t-sne
6
embedding
5

Similar Publications

Monkeypox virus (MPXV), a zoonotic pathogen, re-emerged in 2022 with the Clade IIb variant, raising global health concerns due to its unprecedented spread in non-endemic regions. Recent studies have shown that Clade IIb (2022 MPXV) is marked by unique genomic mutations and epidemiological behaviors, suggesting variations in host-virus interactions. This study aimed to identify the differentially expressed genes (DEGs) induced by the 2022 MPXV infection through comprehensive bioinformatics analyses of microarray and RNA-Seq datasets from post-infected cell types with different MPXV clades.

View Article and Find Full Text PDF

The issue of variability introduced into blood plasma and serum analysis by preanalytical procedures is the major obstacle to obtaining accurate and reproducible results. While the question of how to overcome this issue has been discussed in biochemical detection of analytes and omics technologies, its relevance to the field of optical spectroscopy remains mostly unexplored. In this work, we evaluated the freeze-thaw cycle (FTC)-induced alternations in blood serum optical properties by means of autofluorescence and Raman spectroscopy, including surface-enhanced Raman spectroscopy (SERS).

View Article and Find Full Text PDF

Background: A new circulating biomarker superior to carbohydrate antigen 19-9 (CA19-9) is needed for diagnosing pancreatobiliary cancer (PBca). The aim of this study was to identify serum microRNA (miRNA) signatures comprising reproducible and disease-related miRNAs.

Methods: This multicenter study involved patients with treatment-naïve PBca and healthy participants.

View Article and Find Full Text PDF

Quantitative peripheral live single T-cell dynamic polyfunctionality profiling predicts lung cancer checkpoint immunotherapy treatment response and clinical outcomes.

Transl Lung Cancer Res

December 2024

Penn State Cancer Institute, Penn State Health Milton S. Hershey Medical Center, Penn State College of Medicine, Penn State University, Hershey, PA, USA.

Background: Predictive biomarkers for immune checkpoint inhibitors (ICIs), e.g., programmed death ligand-1 (PD-L1) tumor proportional score (TPS), remain limited in clinical applications.

View Article and Find Full Text PDF

Assessment of microplastics and associated ecological risk in the longest river (Godavari) of peninsular India: A comprehensive source-to-sink analysis in water, sediment and fish.

Mar Pollut Bull

January 2025

Environmental Nanoscience Laboratory, Department of Earth Sciences, Indian Institute of Science Education and Research-Kolkata, Mohanpur, West Bengal 741246, India; Centre for Climate and Environmental Studies, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal 741246, India. Electronic address:

Persistent microplastics (MPs) accumulation in the aqueous environments is considered a threat to the ecosystem, potentially harming aquatic species and human health. In view of the escalating problem of MPs pollution in India, a comprehensive investigation of MPs accumulation in major riverine systems is necessary. The current study aims to estimate MPs abundance in surface water, sediment, and fish samples along the entire stretch of Godavari, the largest river in peninsular India.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!