We develop a Vector Quantized Spectral Clustering (VQSC) algorithm that is a combination of spectral clustering (SC) and vector quantization (VQ) sampling for grouping genome sequences of plants. The inspiration here is to use SC for its accuracy and VQ to make the algorithm computationally cheap (the complexity of SC is cubic in terms of the input size). Although the combination of SC and VQ is not new, the novelty of our work is in developing the crucial similarity matrix in SC as well as use of -medoids in VQ, both adapted for the plant genome data. For Soybean, we compare our approach with commonly used techniques like Un-weighted Pair Graph Method with Arithmetic mean (UPGMA) and Neighbor Joining (NJ). Experimental results show that our VQSC outperforms both these techniques significantly in terms of cluster quality (average improvement of 21% over UPGMA and 24% over NJ) as well as time complexity (order of magnitude faster than both UPGMA and NJ).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435876PMC
http://dx.doi.org/10.1177/1176934319836997DOI Listing

Publication Analysis

Top Keywords

spectral clustering
12
vector quantized
8
quantized spectral
8
genome sequences
8
sequences plants
8
clustering applied
4
applied genome
4
plants develop
4
develop vector
4
clustering vqsc
4

Similar Publications

Introduction: Stroke is a major cause of morbidity and mortality worldwide. While electroencephalography (EEG) offers valuable data on post-stroke brain activity, qualitative EEG assessments may be misinterpreted. Therefore, we examined the potential of quantitative EEG (qEEG) to identify key band frequencies that could serve as potential electrophysiological biomarkers in stroke patients.

View Article and Find Full Text PDF

Study of spectral overlap and heterogeneity in agriculture based on soft classification techniques.

MethodsX

June 2025

Department of Biological and Pharmaceutical Environmental Sciences and Technologies, University of Campania "L. Vanvitelli", Via Antonio Vivaldi, 43, Caserta 81100, CE, Italy.

This study explores the application of fuzzy soft classification techniques combined with vegetation indices to address spectral overlap and heterogeneity in agricultural image processing. The methodology focuses on the integration of three key vegetation indices: Soil-Adjusted Vegetation Index (SAVI), Modified Soil-Adjusted Vegetation Index (MSAVI), and Modified Chlorophyll Absorption in Reflectance Index (MCARI), with Modified Possibilistic C-Means (MPCM) clustering. The analysis involves preprocessing the image data, calculating the vegetation indices, and applying the MPCM algorithm to perform soft classification, allowing pixels to belong to multiple classes with varying degrees of membership.

View Article and Find Full Text PDF

Sparse kernel -means clustering.

J Appl Stat

June 2024

Graduate School, Department of Urban Big Data Convergence, University of Seoul, Seoul, South Korea.

Clustering is an essential technique that groups similar data points to uncover the underlying structure and features of the data. Although traditional clustering methods such as -means are widely utilized, they have limitations in identifying nonlinear clusters. Thus, alternative techniques, such as kernel -means and spectral clustering, have been developed to address this issue.

View Article and Find Full Text PDF

Molecular arrangement in the chiral smectic phases of the glassforming (S)-4'-(1-methylheptylcarbonyl)biphenyl-4-yl 4-[7-(2,2,3,3,4,4,4-heptafluorobutoxy) heptyl-1-oxy]benzoate is investigated by X-ray diffraction. An increased correlation length of the positional short-range order in the supercooled state agrees with the previous assumption of the hexatic smectic phase. However, the registered X-ray diffraction patterns are not typical for the hexatic phases.

View Article and Find Full Text PDF

Constructing multifunctional phosphors grounded in the intricate relationship between energy level structures and luminescent properties has captivated researchers in the luminescent material field. Herein, using the embedded cluster multiconfigurational ab initio method, the energy levels of Bi in the SrLaGaO host at different geometries were calculated, which results in the establishment of complete configurational coordinate curves, yielding breathing mode vibrational frequencies and equilibrium bond lengths for all excited states. These curves supply deep insight into the luminescence properties of Bi-doped phosphors and highlight the impact of ions in the second coordination sphere on luminescence.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!