AI Article Synopsis

  • * The study introduces a new statistical method combining Uniform Manifold Approximation and Projection (UMAP) and Louvain algorithms for better visualization and classification of aquatic biota.
  • * Compared to traditional methods like PCA, the UMAP approach provides clearer and more meaningful groupings of communities, highlighting its advantages for analyzing high-dimensional ecological datasets.

Article Abstract

The analysis of community structure in studies of freshwater ecology often requires the application of dimensionality reduction to process multivariate data. A high number of dimensions (number of taxa/environmental parameters × number of samples), nonlinear relationships, outliers, and high variability usually hinder the visualization and interpretation of multivariate datasets. Here, we proposed a new statistical design using Uniform Manifold Approximation and Projection (UMAP), and community partitioning using Louvain algorithms, to ordinate and classify the structure of aquatic biota in two-dimensional space. We present this approach with a demonstration of five previously published datasets for diatoms, macrophytes, chironomids (larval and subfossil), and fish. Principal Component Analysis (PCA) and Ward's clustering were also used to assess the comparability of the UMAP approach compared to traditional approaches for ordination and classification. The ordination of sampling sites in 2-dimensional space showed a much denser, and easier to interpret, grouping using the UMAP approach in comparison to PCA. The classification of community structure using the Louvain algorithm in UMAP ordinal space showed a high classification strength for data with a high number of dimensions than the cluster patterns obtained with the use of a Ward's algorithm in PCA. Environmental gradients, presented via heat maps, were overlayed with the ordination patterns of aquatic communities, confirming that the ordinations obtained by UMAP were ecologically meaningful. This is the first study that has applied a UMAP approach with classification using Louvain algorithms on ecological datasets. We show that the performance of local and global structures, as well as the number of clusters determined by the algorithm, make this approach more powerful than traditional approaches.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.scitotenv.2021.152365DOI Listing

Publication Analysis

Top Keywords

umap approach
12
uniform manifold
8
manifold approximation
8
approximation projection
8
projection umap
8
ordination classification
8
community structure
8
data high
8
high number
8
number dimensions
8

Similar Publications

The immune system has emerged as a major factor in the pathogenesis of Alzheimer's disease (AD). PANoptosis is a newly defined programmed cell death mechanism related to many inflammatory diseases. This study aimed to identify the differentially expressed (DE) PANoptosis-related genes with characteristics of immune dysregulation (PRGIDs) in AD using bioinformatics analysis of bulk RNA-seq and single-nuclei RNA sequencing (snRNA-seq) data.

View Article and Find Full Text PDF

Diabetic nephropathy (DN) is a major complication of diabetes and a leading cause of renal failure. While valsartan has been shown to alleviate DN clinically, its antifibrotic mechanisms require further investigation. This study used a transcriptomics-driven approach, integrating in vitro, Machine Learning, molecular docking, dynamics simulations and RT-qCPR to identify key antifibrotic targets.

View Article and Find Full Text PDF

Exploratory analysis of single-cell RNA sequencing (scRNA-seq) typically relies on hard clustering over two-dimensional projections like uniform manifold approximation and projection (UMAP). However, such methods can severely distort the data and have many arbitrary parameter choices. Methods that can model scRNA-seq data as non-discrete "gene expression programs" (GEPs) can better preserve the data's structure, but currently, they are often not scalable, not consistent across repeated runs, and lack an established method for choosing key parameters.

View Article and Find Full Text PDF

Background: Hepatocellular carcinoma (HCC) is a prevalent type of cancer with high incidence and mortality rates. It is the third most common cause of cancer-related deaths. CD8 T cell exhaustion (TEX) is a progressive decline in T cell function due to sustained T cell receptor stimulation from continuous antigen exposure.

View Article and Find Full Text PDF

Using UMAP for Partially Synthetic Healthcare Tabular Data Generation and Validation.

Sensors (Basel)

December 2024

Intelligent Data Science and Artificial Intelligence Research Center, Technical University of Catalonia, Nexus II Building, Jordi Girona 29, 08034 Barcelona, Spain.

In healthcare, vast amounts of data are increasingly collected through sensors for smart health applications and patient monitoring or diagnosis. However, such medical data often comprise sensitive patient information, posing challenges regarding data privacy, and are resource-intensive to acquire for significant research purposes. In addition, the common case of lack of information due to technical issues, transcript errors, or differences between descriptors considered in different health centers leads to the need for data imputation and partial data generation techniques.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!